简介:本文详细介绍如何使用Python调用百度语音识别API,涵盖环境配置、API调用、错误处理及优化建议,适合开发者快速实现语音转文本功能。
语音识别技术已成为人机交互的核心环节,广泛应用于智能客服、语音输入、会议记录等场景。百度语音识别API凭借高准确率、低延迟和丰富的功能(如中英文混合识别、实时流式识别),成为开发者首选的云服务之一。本文将详细讲解如何通过Python调用百度API,实现从音频文件到文本的高效转换。
API Key和Secret Key。requests库(用于HTTP请求)和json库(解析响应)。
pip install requests
若需处理音频文件,可额外安装pydub(需FFmpeg支持):
pip install pydub
百度API需通过Access Token进行身份验证,有效期30天。需用API Key和Secret Key换取。
import requestsimport base64import hashlibimport jsondef get_access_token(api_key, secret_key):auth_url = f"https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id={api_key}&client_secret={secret_key}"response = requests.get(auth_url)if response.status_code == 200:return response.json().get("access_token")else:raise Exception("Failed to get access token")
适用于1分钟以内的音频文件,直接返回识别结果。
def recognize_short_audio(access_token, audio_path, format="wav", rate=16000):# 读取音频文件(二进制)with open(audio_path, "rb") as f:audio_data = f.read()# 构造请求URLurl = f"https://vop.baidu.com/server_api?cuid=your_device_id&token={access_token}"# 构造请求头headers = {"Content-Type": "application/json",}# 构造请求体(Base64编码音频)params = {"format": format,"rate": rate,"channel": 1,"token": access_token,"cuid": "your_device_id", # 唯一设备标识"len": len(audio_data),"speech": base64.b64encode(audio_data).decode("utf-8"),}response = requests.post(url, headers=headers, data=json.dumps(params))if response.status_code == 200:result = response.json()if result.get("err_no") == 0:return result["result"][0] # 返回识别文本else:raise Exception(f"API Error: {result.get('err_msg')}")else:raise Exception("HTTP Request Failed")
format:音频格式(如wav、mp3)。rate:采样率(8000或16000)。cuid:设备唯一标识,建议使用MAC地址或随机字符串。适用于长音频(如会议录音),需先上传文件至百度服务器,再轮询获取结果。
def recognize_long_audio(access_token, audio_path):# 1. 获取文件上传URLupload_url = "https://vop.baidu.com/pro_api"headers = {"Content-Type": "application/json"}params = {"token": access_token,"cuid": "your_device_id","len": len(open(audio_path, "rb").read()),"format": "wav","rate": 16000,"channel": 1,}response = requests.post(upload_url, headers=headers, data=json.dumps(params))if response.status_code != 200:raise Exception("Upload URL Request Failed")# 2. 实际文件上传(需使用百度提供的SDK或分片上传)# 此处简化流程,实际需参考百度文档实现分片上传# 3. 提交识别任务task_url = f"https://aip.baidubce.com/rpc/2.0/aspirer/v1/recognize?access_token={access_token}"task_data = {"speech": {"file_id": "your_file_id"}, # 文件ID由上传步骤返回"format": "wav","rate": 16000,}task_response = requests.post(task_url, json=task_data)task_id = task_response.json().get("task_id")# 4. 轮询获取结果result_url = f"https://aip.baidubce.com/rpc/2.0/aspirer/v1/get_result?access_token={access_token}"while True:result_data = {"task_id": task_id}result_response = requests.post(result_url, json=result_data)res = result_response.json()if res.get("status") == 2: # 完成return res["result"]elif res.get("status") == 3: # 失败raise Exception("Recognition Failed")else:import timetime.sleep(1) # 间隔1秒轮询
format、rate是否与音频文件匹配。Access Token过期,需重新获取。pydub统一采样率和格式,减少API报错。
from pydub import AudioSegmentdef convert_audio(input_path, output_path, rate=16000):audio = AudioSegment.from_file(input_path)audio = audio.set_frame_rate(rate)audio.export(output_path, format="wav")
import requestsimport base64import jsonclass BaiduASR:def __init__(self, api_key, secret_key):self.api_key = api_keyself.secret_key = secret_keyself.access_token = self._get_access_token()def _get_access_token(self):url = f"https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id={self.api_key}&client_secret={self.secret_key}"response = requests.get(url)if response.status_code == 200:return response.json().get("access_token")else:raise Exception("Failed to get access token")def recognize(self, audio_path, format="wav", rate=16000):with open(audio_path, "rb") as f:audio_data = f.read()url = f"https://vop.baidu.com/server_api?cuid=test_device&token={self.access_token}"headers = {"Content-Type": "application/json"}params = {"format": format,"rate": rate,"channel": 1,"token": self.access_token,"cuid": "test_device","len": len(audio_data),"speech": base64.b64encode(audio_data).decode("utf-8"),}response = requests.post(url, headers=headers, data=json.dumps(params))if response.status_code == 200:result = response.json()if result.get("err_no") == 0:return result["result"][0]else:raise Exception(f"API Error: {result.get('err_msg')}")else:raise Exception("HTTP Request Failed")# 使用示例if __name__ == "__main__":api_key = "your_api_key"secret_key = "your_secret_key"asr = BaiduASR(api_key, secret_key)text = asr.recognize("test.wav")print("识别结果:", text)
本文详细介绍了Python调用百度语音识别API的全流程,包括环境配置、同步/异步接口调用、错误处理及优化建议。开发者可根据实际需求选择合适的识别模式,并通过预处理和缓存机制提升性能。未来可探索结合WebSocket实现实时语音识别,或集成到Flask/Django应用中构建完整语音服务。