语音对话COOKBOOK
更新时间:2025-01-17
目标
实现一个语音对话功能,支持多种语音音色。用户可以参考cookbook代码,通过AppBuilder-SDK将语音功能很好地融入自己的平台、应用中。
实现原理
通过循环不断处理用户的语音,将语音转文本,然后进行对话,最后将对话结果通过TTS进行播报。
- 使用大模型的 ASR 进行语音转文本。
 - 使用用户自己创建的Agent进行对话,适配用户的应用场景,并具有上下文理解能力。
 - 使用大模型的 TTS 进行文本转语音并进行播报。
 
前置条件
- 使用内置ASR、TTS组件之前,请先开通组件服务 ( 短语音识别-极速版 、 短文本在线合成 ) 并够买额度,可参考开通组件服务。
 - pip安装pyaudio、webrtcvad依赖包
 - 给程序开放麦克风权限
 - 创建好自己的Agent应用
 
示例代码
                Plain Text
                
            
            1 # Copyright (c) 2024 Baidu, Inc. All Rights Reserved.
2 #
3 # Licensed under the Apache License, Version 2.0 (the "License");
4 # you may not use this file except in compliance with the License.
5 # You may obtain a copy of the License at
6 #
7 #     http://www.apache.org/licenses/LICENSE-2.0
8 #
9 # Unless required by applicable law or agreed to in writing, software
10 # distributed under the License is distributed on an "AS IS" BASIS,
11 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 # See the License for the specific language governing permissions and
13 # limitations under the License.
14
15 import os
16 import time
17 import wave
18 import sys
19 import pyaudio
20 import webrtcvad
21 import appbuilder
22 import re
23
24 # 请前往千帆AppBuilder官网创建密钥,流程详见
25 https://cloud.baidu.com/doc/AppBuilder/s/Olq6grrt6#1%E3%80%81%E5%88%9B%E5%BB%BA%E5%AF%86%E9%92%A5
26 
27 # 设置环境变量
28 os.environ["APPBUILDER_TOKEN"] = (
29    "..."
30 )
31 # 已发布AppBuilder应用的ID
32 app_id = "..."
33 appbuilder.logger.setLoglevel("ERROR")
34
35 CHUNK = 1024
36 FORMAT = pyaudio.paInt16
37 CHANNELS = 1 if sys.platform == "darwin" else 2
38 RATE = 16000
39 DURATION = 30  # ms
40 CHUNK = RATE // 1000 * DURATION
41
42
43 class Chatbot:
44    def __init__(self):
45        self.p = pyaudio.PyAudio()
46        self.tts = appbuilder.TTS()
47        self.asr = appbuilder.ASR()
48        self.agent = appbuilder.AppBuilderClient(app_id)
49        self.conversation_id = self.agent.create_conversation()
50
51    def run(self):
52        self.run_tts_and_play_audio(
53            "我是你的专属聊天机器人,如果你有什么问题,可以直接问我"
54        )
55        while True:
56            # Record
57            audio_path = "output.wav"
58            print("开始记录音频...")
59            if self.record_audio(audio_path) < 1000:
60                time.sleep(1)
61                continue
62            print("音频记录结束")
63
64            # ASR
65            print("开始执行ASR...")
66            query = self.run_asr(audio_path)
67            print("结束执行ASR")
68
69            # Agent
70            print("query: ", query)
71            if len(query) == 0:
72                continue
73            answer = self.run_agent(query)
74            results = re.findall(r"(https?://[^\s]+)", answer)
75            for result in results:
76                print("链接地址:", result)
77                answer = answer.replace(result, "")
78            print("answer:", answer)
79
80            # TTS
81            print("开始执行TTS并播报...")
82            self.run_tts_and_play_audio(answer)
83            print("结束TTS并播报结束")
84
85    def record_audio(self, path):
86        with wave.open(path, "wb") as wf:
87            wf.setnchannels(CHANNELS)
88            wf.setsampwidth(self.p.get_sample_size(FORMAT))
89            wf.setframerate(RATE)
90            stream = self.p.open(
91                format=FORMAT, channels=CHANNELS, rate=RATE, input=True
92            )
93            vad = webrtcvad.Vad(1)
94            not_speech_times = 0
95            speech_times = 0
96            total_times = 0
97            start_up_times = 33 * 5  # 初始时间设置为5秒
98            history_speech_times = 0
99            while True:
100                if history_speech_times > 33 * 10:
101                    break
102                data = stream.read(CHUNK, False)
103                if vad.is_speech(data, RATE):
104                    speech_times += 1
105                    wf.writeframes(data)
106                else:
107                    not_speech_times += 1
108                total_times += 1
109                if total_times >= start_up_times:
110                    history_speech_times += speech_times
111                    # 模拟滑窗重新开始计数
112                    if float(not_speech_times) / float(total_times) > 0.7:
113                        break
114                    not_speech_times = 0
115                    speech_times = 0
116                    total_times = 0
117                    start_up_times = start_up_times / 2
118                    if start_up_times < 33:
119                        start_up_times = 33
120            stream.close()
121            return history_speech_times * DURATION
122
123    def run_tts_and_play_audio(self, text: str):
124        # AppBuilder内置的TTS使用文档,用户可根据文档调整参数:
125        https://github.com/baidubce/app-builder/tree/master/python/core/components/tts
126        
127        msg = self.tts.run(
128            appbuilder.Message(content={"text": text}),
129            speed=5,
130            pitch=5,
131            volume=5,
132            person=0,
133            audio_type="pcm",
134            model="paddlespeech-tts",
135            stream=True,
136        )
137        stream = self.p.open(
138            format=self.p.get_format_from_width(2),
139            channels=1,
140            rate=24000,
141            output=True,
142            frames_per_buffer=2048,
143        )
144        for pcm in msg.content:
145            stream.write(pcm)
146        stream.stop_stream()
147        stream.close()
148
149    # AppBuilder内置的ASR使用文档,用户可根据文档调整参数:
150    https://github.com/baidubce/app-builder/blob/master/python/core/components/asr/README.md
151    
152    def run_asr(self, audio_path: str):
153        with open(audio_path, "rb") as f:
154            content_data = {"audio_format": "wav", "raw_audio": f.read(), "rate": 16000}
155            msg = appbuilder.Message(content_data)
156            out = self.asr.run(msg)
157            text = out.content["result"][0]
158            return text
159
160    def run_agent(self, query):
161        msg = self.agent.run(self.conversation_id, query, stream=True)
162        answer = ""
163        for content in msg.content:
164            answer += content.answer
165        return answer
166
167
168 if __name__ == "__main__":
169    chatbot = Chatbot()
170    chatbot.run()
171     
            使用方法
直接运行程序即可。
也可以将下面的功能模块替换成自己的其他实现或模型:
- record_audio: 录音
 - run_asr: 语音识别,AppBuilder ASR组件使用文档
 - run_agent: Agent对话功能。
 - run_tts_and_play_audio:回复的语音生成并播报。AppBuilder TTS组件使用文档
 
流式TTS已经上线,测试配额申请地址配额。
