简介:本文详细介绍如何使用Python集成百度语音识别API与图灵机器人API,构建具备语音交互能力的智能对话系统。涵盖环境配置、API调用、异常处理及完整代码实现,适合开发者快速上手。
本系统采用三层架构设计:
选择百度语音识别而非其他方案主要基于:
推荐使用Python 3.7+环境,需安装以下依赖:
pip install baidu-aip requests pyaudio
其中:
baidu-aip:百度AI开放平台官方SDKrequests:HTTP请求库pyaudio:音频采集库
from aip import AipSpeech# 初始化语音识别客户端APP_ID = 'your_app_id'API_KEY = 'your_api_key'SECRET_KEY = 'your_secret_key'client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)def recognize_speech(audio_file):with open(audio_file, 'rb') as f:audio_data = f.read()result = client.asr(audio_data, 'wav', 16000, {'dev_pid': 1537, # 中文普通话识别})if result['err_no'] == 0:return result['result'][0]else:raise Exception(f"识别失败: {result['err_msg']}")
对于实时交互场景,建议使用WebSocket协议:
import websocketimport jsonimport base64def realtime_recognition():def on_message(ws, message):data = json.loads(message)if data['type'] == 'FINAL_RESULT':print("识别结果:", data['result']['text'])ws = websocket.WebSocketApp("wss://vop.baidu.com/websocket_asr",on_message=on_message)# 认证与初始化流程...ws.run_forever()
需重点处理以下异常:
import requestsTULING_API_KEY = 'your_tuling_api_key'def get_tuling_response(text, user_id='test_user'):url = "http://openapi.tuling123.com/openapi/api/v2"data = {"reqType": 0,"perception": {"inputText": {"text": text}},"userInfo": {"apiKey": TULING_API_KEY, "userId": user_id}}response = requests.post(url, json=data)return response.json()['results'][0]['values']['text']
建议实现对话状态跟踪:
class DialogManager:def __init__(self):self.context = {}def process(self, text, user_id):# 保存上下文信息self.context[user_id] = {'last_question': text,'session_id': str(uuid.uuid4())}# 调用图灵APIresponse = get_tuling_response(text, user_id)# 更新上下文if '需要追问' in response:self.context[user_id]['need_followup'] = Truereturn response
import pyaudioimport waveimport threadingclass VoiceAssistant:def __init__(self):self.dialog_mgr = DialogManager()self.running = Falsedef record_audio(self, filename, duration=5):CHUNK = 1024FORMAT = pyaudio.paInt16CHANNELS = 1RATE = 16000p = pyaudio.PyAudio()stream = p.open(format=FORMAT,channels=CHANNELS,rate=RATE,input=True,frames_per_buffer=CHUNK)print("开始录音...")frames = []for _ in range(0, int(RATE / CHUNK * duration)):data = stream.read(CHUNK)frames.append(data)print("录音结束")stream.stop_stream()stream.close()p.terminate()wf = wave.open(filename, 'wb')wf.setnchannels(CHANNELS)wf.setsampwidth(p.get_sample_size(FORMAT))wf.setframerate(RATE)wf.writeframes(b''.join(frames))wf.close()def start_interaction(self):self.running = Truewhile self.running:self.record_audio('temp.wav', 3)try:text = recognize_speech('temp.wav')print(f"你说: {text}")response = self.dialog_mgr.process(text, 'default_user')print(f"回复: {response}")# 可选调用语音合成API输出语音except Exception as e:print(f"处理错误: {str(e)}")if __name__ == "__main__":assistant = VoiceAssistant()assistant.start_interaction()
提供Dockerfile示例:
FROM python:3.8-slimWORKDIR /appCOPY requirements.txt .RUN pip install -r requirements.txtCOPY . .CMD ["python", "main.py"]
本实现方案在测试环境中达到:
通过合理优化,该系统可应用于智能客服、家庭助手、教育辅导等多个场景,开发者可根据实际需求调整功能模块。