简介:本文详细介绍如何使用Python调用百度语音识别API,涵盖环境准备、API密钥获取、请求封装、错误处理及优化建议,帮助开发者快速实现语音转文本功能。
实现百度语音识别API的核心在于构建HTTP请求并处理JSON响应,需依赖requests库完成网络通信。推荐使用Python 3.6+版本以确保兼容性,通过pip install requests安装依赖库。若需处理音频文件,可额外安装pydub库进行格式转换,例如将MP3转为WAV格式(百度API要求采样率16k或8k,16bit位深,单声道)。
示例代码:音频格式转换
from pydub import AudioSegmentdef convert_audio(input_path, output_path, sample_rate=16000):audio = AudioSegment.from_file(input_path)audio = audio.set_frame_rate(sample_rate)audio.export(output_path, format="wav")
API Key和Secret Key。API Key和Secret Key换取临时令牌,有效期30天。需定期刷新以避免服务中断。Access Token获取示例
import requestsimport base64import hashlibimport jsonimport timedef get_access_token(api_key, secret_key):auth_url = f"https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id={api_key}&client_secret={secret_key}"response = requests.get(auth_url)return response.json().get("access_token")
百度语音识别API支持多种识别模式,包括实时流式识别和文件识别。本文以文件识别(asr_file)为例,关键参数如下:
format:音频格式(wav/pcm/amr/mp3)rate:采样率(8000/16000)channel:声道数(1)cuid:设备唯一标识(可选)speech:音频文件二进制数据(Base64编码)完整请求示例
def baidu_asr(audio_path, access_token):# 读取音频文件并Base64编码with open(audio_path, "rb") as f:audio_data = f.read()speech = base64.b64encode(audio_data).decode("utf-8")# 构造请求参数url = f"https://vop.baidu.com/server_api?cuid=YOUR_CUID&token={access_token}"headers = {"Content-Type": "application/json"}data = {"format": "wav","rate": 16000,"channel": 1,"speech": speech,"len": len(audio_data)}# 发送POST请求response = requests.post(url, headers=headers, data=json.dumps(data))return response.json()
API可能返回以下错误状态:
增强版错误处理
def safe_asr(audio_path, access_token):try:result = baidu_asr(audio_path, access_token)if result.get("err_no") != 0:raise Exception(f"API Error: {result.get('err_msg')}")return result["result"][0] # 返回识别文本except requests.exceptions.RequestException as e:raise Exception(f"Network Error: {str(e)}")except KeyError:raise Exception("Invalid API Response Format")
音频预处理:
sox或ffmpeg统一音频参数,避免API拒绝处理。并发控制:
asyncio或threading实现异步请求,但需控制并发数。结果后处理:
[语音]标签)。异步请求示例
import asyncioimport aiohttpasync def async_asr(audio_path, access_token):async with aiohttp.ClientSession() as session:url = f"https://vop.baidu.com/server_api?token={access_token}"with open(audio_path, "rb") as f:audio_data = f.read()speech = base64.b64encode(audio_data).decode("utf-8")data = {"format": "wav", "rate": 16000, "speech": speech}async with session.post(url, json=data) as resp:return await resp.json()
结合上述模块,构建一个可复用的语音识别服务:
class BaiduASRService:def __init__(self, api_key, secret_key):self.api_key = api_keyself.secret_key = secret_keyself.access_token = Noneself.token_expire = 0def refresh_token(self):self.access_token = get_access_token(self.api_key, self.secret_key)self.token_expire = time.time() + 2592000 # 30天后过期def recognize(self, audio_path):if time.time() > self.token_expire:self.refresh_token()try:result = baidu_asr(audio_path, self.access_token)if result.get("err_no") == 110: # Token无效self.refresh_token()return self.recognize(audio_path)return result["result"][0]except Exception as e:print(f"Recognition failed: {str(e)}")return None
音频无法识别:
Access Token频繁失效:
识别准确率低:
dev_pid=1737(带噪环境模型)。lan=zh明确指定中文识别。通过Python调用百度语音识别API,开发者可快速构建语音交互应用。关键步骤包括:
未来可探索的方向:
附:百度语音识别API文档链接
百度语音识别技术文档
建议开发者定期查阅官方文档,以获取最新功能(如方言识别、情绪检测等)和配额调整信息。