简介:本文详细介绍如何在树莓派Linux系统上实现ChatGPT语音交互,涵盖语音识别、TTS技术整合及优化策略,提供完整代码示例与部署指南。
树莓派作为边缘计算设备,其Linux系统(如Raspberry Pi OS)可通过集成语音识别、TTS引擎与ChatGPT API,构建完整的语音交互链路。核心组件包括:
推荐使用开源工具链:
openai库与异步请求处理
# 更新系统包sudo apt update && sudo apt upgrade -y# 安装Python3与pipsudo apt install python3 python3-pip -y# 配置音频设备(以USB麦克风为例)arecord -l # 确认设备IDsudo nano /etc/asound.conf # 配置默认设备
# 语音识别(Vosk)pip3 install vosksudo apt install libportaudio2# TTS引擎(Piper)sudo apt install ffmpegpip3 install pydubgit clone https://github.com/rhasspy/piper.gitcd piper && ./install.sh# ChatGPT API客户端pip3 install openai
from vosk import Model, KaldiRecognizerimport pyaudiomodel = Model("path/to/vosk-model-small-en-us-0.15")p = pyaudio.PyAudio()stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=4096)rec = KaldiRecognizer(model, 16000)while True:data = stream.read(4096)if rec.AcceptWaveform(data):print(rec.Result()) # 输出识别文本
优化建议:
vosk-model-small降低内存占用(约50MB)arecord --duration=5 --format=S16_LE --rate=16000 test.wav测试音频输入
import speech_recognition as srr = sr.Recognizer()with sr.Microphone() as source:print("Listening...")audio = r.listen(source, timeout=5)try:text = r.recognize_google(audio, language="en-US")print(f"Recognized: {text}")except Exception as e:print(f"Error: {e}")
import openaiopenai.api_key = "YOUR_API_KEY"def chat_with_gpt(prompt):response = openai.ChatCompletion.create(model="gpt-3.5-turbo",messages=[{"role": "user", "content": prompt}],temperature=0.7)return response.choices[0].message["content"]
def stream_chat(prompt):response = openai.ChatCompletion.create(model="gpt-3.5-turbo",messages=[{"role": "user", "content": prompt}],stream=True)for chunk in response:if "content" in chunk["choices"][0]["delta"]:print(chunk["choices"][0]["delta"]["content"], end="", flush=True)
import subprocessdef text_to_speech(text, output_file="output.wav"):cmd = ["piper","--model", "en_US-lessac-medium.onnx","--output-file", output_file,text]subprocess.run(cmd, check=True)def play_audio(file_path):subprocess.run(["aplay", file_path])
import pygamepygame.mixer.init()def play_tts_realtime(text):# 生成临时音频文件text_to_speech(text, "temp.wav")# 流式播放sound = pygame.mixer.Sound("temp.wav")sound.play()while pygame.mixer.music.get_busy():pygame.time.Clock().tick(10)
import asynciofrom vosk import Model, KaldiRecognizerimport pyaudioimport openai# 初始化组件model = Model("vosk-model-small-en-us-0.15")recognizer = KaldiRecognizer(model, 16000)p = pyaudio.PyAudio()stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True)async def main_loop():while True:print("Waiting for speech...")data = b""while True:chunk = stream.read(4096)if recognizer.AcceptWaveform(chunk):breakdata += chunk# 获取识别结果text = recognizer.Result()print(f"You said: {text}")# 调用ChatGPTresponse = chat_with_gpt(text)print(f"ChatGPT: {response}")# TTS输出text_to_speech(response)play_audio("output.wav")# 启动异步循环asyncio.run(main_loop())
vosk-model-tiny(20MB)替代完整模型def audio_thread():
while True:
data = stream.read(4096)
# 处理音频数据
thread = Thread(target=audio_thread)
thread.daemon = True
thread.start()
### 八、部署与调试技巧1. **日志系统**:```pythonimport logginglogging.basicConfig(filename='chatbot.log',level=logging.INFO,format='%(asctime)s - %(levelname)s - %(message)s')
arecord -D plughw:1,0 -f cd -t wav | aplay -D plughw:1,0
```
alsamixer调整输入增益sudo fallocate -l 2G /swapfile)| 组件 | 树莓派4B资源占用 | 成本估算 |
|---|---|---|
| Vosk识别 | 15% CPU | 免费 |
| ChatGPT API | $0.002/次请求 | 按量计费 |
| Piper TTS | 100MB内存 | 免费 |
| 总计 | <50%资源 | <$1/月 |
结论:通过树莓派Linux系统整合语音识别、TTS与ChatGPT,可构建低成本、高可定制的语音交互终端。实际部署时需根据场景选择离线/在线方案,并通过异步处理与硬件优化提升实时性。完整代码库与模型文件建议通过Git管理,便于版本控制与协作开发。