简介:本文以Python为核心,结合主流语音识别库(SpeechRecognition、PyAudio),通过5-10行核心代码实现文本到语音的双向转换。涵盖语音转文本(ASR)与文本转语音(TTS)的完整流程,包含环境配置、依赖安装、异常处理等关键细节,适合开发者快速集成语音功能。
语音识别技术主要分为语音转文本(ASR)和文本转语音(TTS)两大方向。ASR通过声学模型将音频信号转换为文本,TTS则通过语音合成技术将文本转为可听语音。现代开发中,开发者无需从零实现算法,可借助成熟的语音处理库快速构建功能。
# 安装核心库(以pip为例)pip install SpeechRecognition PyAudio gTTS pyttsx3
注意事项:
portaudio)。.whl文件手动安装。
import speech_recognition as srdef speech_to_text():recognizer = sr.Recognizer()with sr.Microphone() as source:print("请说话...")audio = recognizer.listen(source, timeout=5) # 录制5秒音频try:text = recognizer.recognize_google(audio, language='zh-CN') # 中文识别print("识别结果:", text)except sr.UnknownValueError:print("无法识别语音")except sr.RequestError as e:print(f"API请求失败:{e}")speech_to_text()
代码解析:
sr.Recognizer()创建识别器实例。sr.Microphone()作为音频源,支持实时录音。recognize_google()调用Google API,需联网且支持中文(zh-CN)。
def offline_speech_to_text():recognizer = sr.Recognizer()with sr.Microphone() as source:audio = recognizer.listen(source)try:text = recognizer.recognize_sphinx(audio, language='zh-CN') # 需安装中文模型print("离线识别结果:", text)except Exception as e:print(f"识别失败:{e}")# 需额外安装:pip install pocketsphinx# 并下载中文声学模型(https://sourceforge.net/projects/cmusphinx/files/Acoustic%20Models/)
适用场景:无网络环境或隐私敏感场景,但准确率低于在线API。
from gtts import gTTSimport osdef text_to_speech_online(text):tts = gTTS(text=text, lang='zh-cn', slow=False) # 中文,正常语速tts.save("output.mp3") # 保存为MP3文件os.system("start output.mp3") # Windows播放(macOS用`afplay`,Linux用`mpg321`)text_to_speech_online("你好,世界!")
优势:支持多语言、语速调节,但依赖网络且需处理API调用限制。
import pyttsx3def text_to_speech_offline(text):engine = pyttsx3.init()voices = engine.getProperty('voices')engine.setProperty('voice', voices[1].id) # 切换为中文语音(需系统支持)engine.say(text)engine.runAndWait()text_to_speech_offline("这是离线语音合成示例")
配置要点:
espeak的中文数据)。engine.setProperty('rate', 150)调整语速。
def preprocess_audio():recognizer = sr.Recognizer()with sr.Microphone() as source:recognizer.adjust_for_ambient_noise(source) # 环境噪声适应audio = recognizer.listen(source)return audio
作用:提升嘈杂环境下的识别率。
import threadingdef async_recognition():def recognize_thread():recognizer = sr.Recognizer()with sr.Microphone() as source:audio = recognizer.listen(source)try:text = recognizer.recognize_google(audio)print("异步识别结果:", text)except Exception as e:print(e)thread = threading.Thread(target=recognize_thread)thread.start()async_recognition()
import speech_recognition as srfrom gtts import gTTSimport osimport threadingclass VoiceAssistant:def __init__(self):self.recognizer = sr.Recognizer()def listen(self):with sr.Microphone() as source:print("等待指令...")self.recognizer.adjust_for_ambient_noise(source)audio = self.recognizer.listen(source, timeout=3)return audiodef recognize(self, audio):try:return self.recognizer.recognize_google(audio, language='zh-CN')except Exception as e:return str(e)def speak(self, text):tts = gTTS(text=text, lang='zh-cn')tts.save("temp.mp3")os.system("start temp.mp3") # 跨平台需替换命令def run(self):while True:audio = self.listen()text = self.recognize(audio)print(f"用户说:{text}")if "退出" in text:self.speak("再见!")breakself.speak(f"你刚才说了:{text}")if __name__ == "__main__":assistant = VoiceAssistant()assistant.run()
识别率低:
timeout=3)。adjust_for_ambient_noise)。TTS语音生硬:
slow=True或rate参数)。跨平台兼容性:
start,macOS用afplay)。本文通过10行核心代码实现了语音识别与合成的完整流程,开发者可根据需求选择在线(高准确率)或离线(隐私保护)方案。进一步优化方向包括:
资源推荐:
通过模块化设计,上述代码可轻松扩展为智能客服、语音笔记等应用,助力开发者快速落地语音交互场景。