语音识别 HarmonyOS SDK
更新时间:2024-12-31
1. 文档说明
文档名称 | 语音识别集成文档 |
---|---|
所属平台 | HarmonyOS |
提交日期 | 2024-12-30 |
概述 | 本文档是百度语音开放平台HarmonyOS SDK的用户指南,描述了短语音识别、长语音识别等相关接口的使用说明。SDK内部均为采用流式协议,即用户边说边处理。区别于Restapi需要上传整个录音文件。 |
2. 版本说明
名称 | 版本号 |
---|---|
语音识别 | 1.0.0 |
系统支持 | HarmonyOS 5.0.0(APILevel 12)+ |
架构支持 | arm64-v8a,armeabi-v7a |
3. SDK说明
3.1. 开发包说明
文件名称 | 说明 |
---|---|
doc/Baidu_ASR_SDK_Harmony_Manual.md | 本文档 |
har | 语音识别SDK har库 |
BaiduAsrDemo | 开发示例 |
4. 在线识别调用流程
4.1 初始化 SDK
需要传递三个参数,分别为context上下文,产品pid,cuid。
示例代码
SpeechEventManager.getInstance().initSdk(
getContext(this) as common.UIAbilityContext, // context上下文
"1234", // 产品pid
"cuid" // cuid 可选,非必填
)
4.2 识别
4.2.1 启动识别
this.listener 和 callback 均是回调,用于传入不同形式的回调,业务方按需使用,仅传一个即可
let startParams: StartParamsAsr = new StartParamsAsr()
startParams.pid = 123 as number // 识别环境 - pid:非必填,默认1537
startParams.authInfo = {ak: "apikey", sk: "secretkey"} // 鉴权参数
startParams.asrType = SpeechAsrType.SHORT // SpeechAsrType.SHORT: 单次; SpeechAsrType.TOUCH: 长按 SpeechAsrType.MULTI: 全双工; SpeechAsrType.TRANSLITERATE: 长语音转写
startParams.earlyReturn = 1 // 是否打开提前返回
startParams.acceptAudioVolume = true // 是否接收音量回调
interface Result {
word: string[];
confident: number[];
}
interface RecognizeParams {
err_no: number;
result: Result;
asr_align_begin: number;
asr_align_end: number;
raf: number;
early_return_duration_frame: number;
corpus_no: number;
sn: string;
force_align_result: string;
confidence_status: number;
product_id: number;
product_line: string;
other_params: string;
result_type: string;
speak_speed: number;
voice_power: number;
}
SpeechEventManager.getInstance().startAsr(startParams, this.asrListener,
(asrState: string, params: string = '', audioData: ArrayBuffer) => {
let showMsg = "====" + asrState + ", " + params
let msg = "Asr callback: " + showMsg
LogUtil.d(msg)
if (asrState === SpeechAsrState.ASR_AUDIO_DATA) {
} else if (asrState === SpeechAsrState.ASR_AUDIO_VOLUME_LEVEL) {
} else if (asrState === SpeechAsrState.ASR_READY) {
} else if (asrState === SpeechAsrState.ASR_PARTIAL) { // 中间结果
const parsedResponse = JSON.parse(params) as RecognizeParams
console.log("Partial result: " + parsedResponse?.result?.word[0])
} else if (asrState === SpeechAsrState.ASR_FINAL) { // 最终结果
} else if (asrState === SpeechAsrState.ASR_TTS) { // tts
} else if (asrState === SpeechAsrState.ASR_THIRD) { // 三方数据
}else if (asrState === SpeechAsrState.ASR_FINISH) { // 识别结束
const parsedResponse = JSON.parse(params) as RecognizeParams
console.log("Final result: " + parsedResponse?.result?.word[0])
} else if (asrState === SpeechAsrState.ASR_EXIT) {
}
})
4.2.1.1 回调样例
asr.start, {"sn": "a6eb51fb-d44a-465b-9129-9408ae4d7df5"}
asr.ready, {"sn": "a6eb51fb-d44a-465b-9129-9408ae4d7df5"}
asr.begin, {"sn": "a6eb51fb-d44a-465b-9129-9408ae4d7df5"}
asr.partial, {"sn": "a6eb51fb-d44a-465b-9129-9408ae4d7df5", "err_no": 0, "best_result": "我"}
asr.partial, {"sn": "a6eb51fb-d44a-465b-9129-9408ae4d7df5", "best_result": "我放"}
asr.partial, {"sn": "a6eb51fb-d44a-465b-9129-9408ae4d7df5", "best_result": "播放音"}
asr.partial, {"sn": "a6eb51fb-d44a-465b-9129-9408ae4d7df5", "best_result": "播放音乐"}
asr.end, {"sn": "a6eb51fb-d44a-465b-9129-9408ae4d7df5"}
asr.final_result, {"sn": "a6eb51fb-d44a-465b-9129-9408ae4d7df5", "best_result": "播放音乐。"}
asr.finish, {"sn":"a6eb51fb-d44a-465b-9129-9408ae4d7df5","err_no":0,"err":{"errorCode":0,"desc":"Speech Recognize success."}}
asr.exit, {"sn": "a6eb51fb-d44a-465b-9129-9408ae4d7df5"}
识别失败:
asr.final_result, {"sn": "162a2ef1-4551-41e9-aac3-93496500b409", "err_no": -3005, "err_msg": "asr server not find effective speech"}
4.2.1.1.1 asr.finish
{
sn: string,
err_no: number,
err: {
errcode: number,
desc: string
}
}
sn:asr.finish对应的query的sn。
err_no:错误码,正常识别结束为0
err:{
errcode:错误码,和err_no一致
desc:错误描述
}
4.2.2 动态设置参数
let configParams: ConfigParamsAsr = new ConfigParamsAsr()
configParams.enableLongPress = true
SpeechEventManager.getInstance().configAsr(configParams)
4.2.3 停止识别
SpeechEventManager.getInstance().stopAsr()
4.2.4 取消识别
SpeechEventManager.getInstance().cancelAsr()
4.3. 错误码映射
错误事件 | 鸿蒙错误码 | 对应安卓事件 | 安卓错误码 | 描述 |
---|---|---|---|---|
ERROR_VAD_NO_SPEECH | 1001 | ERROR_AUDIO_VAD_NO_SPEECH | 3101 | 没有检测到说话开始 |
ERROR_VAD_INIT_ERROR | 1002 | ERROR_AUDIO_VAD_INCORRECT | 3100 | VAD初始化失败 |
ERROR_NETWORK_FAIL_CONNECT | 2001 | ERROR_NETWORK_FAIL_CONNECT | 2000 | 网络连接失败 |
ERROR_NETWORK_LINK_DOWN | 2002 | ~ | ~ | 网络连接断开(识别中,系统ws触发close回调) |
ERROR_NETWORK_ERROR | 2100 | ERROR_NETWORK_NOT_AVAILABLE | 2100 | 网络错误 |
ERROR_AUDIO_RECORDER_OPEN | 3001 | ERROR_AUDIO_RECORDER_OPEN | 3001 | 录音机打开失败 |
ERROR_USER_CANCEL | 7002 | ERROR_EMPTY_RESULT | 7002 | 用户调用exitAsr |
ERROR_AUDIO_RECORDER_NO_PERMISSION | 9001 | ERROR_NO_RECORD_PERMISSION | 9001 | 没有录音机权限 |
4.4 Debug功能
4.4.1 日志打印
// 设置关闭日志。
LogUtil.isLog = false;
// 设置打开日志。
LogUtil.isLog = true;
4.4.2 debug音频保存
// 设置保存路径,即可保存debug音频
fileRootPath = this.context.filesDir
ConfigUtil.setDebugAudioPath(fileRootPath)