简介:本文深入讲解Python中SpeechRecognition库的语音转文本实现,涵盖安装配置、基础功能、高级特性及错误处理,帮助开发者快速掌握语音识别技术。
SpeechRecognition是Python生态中最流行的语音识别库之一,支持多种语音识别引擎(如Google Web Speech API、Microsoft Bing Voice Recognition、CMU Sphinx等),开发者无需深入理解语音识别算法即可快速实现功能。其核心优势在于:
典型应用场景包括:语音助手开发、会议记录自动化、无障碍技术应用、智能家居控制等。以医疗行业为例,某医院通过SpeechRecognition实现医生口述病历的实时转写,将单份病历录入时间从15分钟缩短至2分钟。
pip install SpeechRecognition pyaudio# 如需使用CMU Sphinx离线识别pip install pocketsphinx
常见问题:Windows用户安装pyaudio失败时,需先下载对应Python版本的.whl文件手动安装
import speech_recognition as sr# 创建识别器实例recognizer = sr.Recognizer()# 获取音频源(麦克风/文件)with sr.Microphone() as source:print("请说话...")audio = recognizer.listen(source)# 执行识别try:text = recognizer.recognize_google(audio, language='zh-CN')print("识别结果:", text)except sr.UnknownValueError:print("无法识别音频")except sr.RequestError as e:print(f"服务错误: {e}")
麦克风输入:
with sr.Microphone(sample_rate=44100) as source:recognizer.adjust_for_ambient_noise(source) # 环境噪声适应audio = recognizer.listen(source, timeout=5) # 5秒超时
关键参数:sample_rate(建议44100Hz)、phrase_time_limit(单句最大时长)
音频文件处理:
from os import pathAUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "test.wav")with sr.AudioFile(AUDIO_FILE) as source:audio = recognizer.record(source)
支持格式:WAV、AIFF、FLAC(需16kHz采样率)
| 引擎 | 调用方法 | 特点 | 适用场景 |
|---|---|---|---|
| Google Web Speech | recognize_google() |
高准确率,需网络 | 通用场景 |
| Microsoft Bing | recognize_bing() |
需API密钥 | 企业级应用 |
| CMU Sphinx | recognize_sphinx() |
纯离线,支持中文 | 隐私敏感场景 |
| Wit.ai | recognize_wit() |
自定义模型 | 专业领域 |
示例:使用Sphinx离线识别
text = recognizer.recognize_sphinx(audio, language='zh-CN')
实时流式识别:
def callback(recognizer, audio):try:print(recognizer.recognize_google(audio))except Exception as e:passr = sr.Recognizer()with sr.Microphone() as source:r.listen_in_background(source, callback)while True: pass # 持续运行
多语言支持:
# 英语识别text_en = recognizer.recognize_google(audio, language='en-US')# 日语识别text_jp = recognizer.recognize_google(audio, language='ja-JP')
预处理建议:
adjust_for_ambient_noise()减少背景噪声后处理技巧:
import redef post_process(text):# 去除语气词text = re.sub(r'[呃啊啦]', '', text)# 数字标准化text = re.sub(r'二零二三年', '2023年', text)return text
def safe_recognize(audio):try:return recognizer.recognize_google(audio)except sr.UnknownValueError:return "[未识别]"except sr.RequestError as e:return f"[服务错误:{str(e)}]"except Exception as e:return f"[未知错误:{str(e)}]"
多线程处理:
from concurrent.futures import ThreadPoolExecutordef process_audio(audio_chunk):return recognizer.recognize_google(audio_chunk)with ThreadPoolExecutor(max_workers=4) as executor:results = list(executor.map(process_audio, audio_chunks))
import speech_recognition as srimport datetimeimport jsonclass VoiceNote:def __init__(self):self.recognizer = sr.Recognizer()self.notes = []def record_note(self):print(f"{datetime.datetime.now()} 开始录音...")with sr.Microphone() as source:self.recognizer.adjust_for_ambient_noise(source)audio = self.recognizer.listen(source, timeout=10)try:text = self.recognizer.recognize_google(audio, language='zh-CN')note = {'timestamp': str(datetime.datetime.now()),'content': text}self.notes.append(note)print("记录成功")return noteexcept Exception as e:print(f"记录失败: {str(e)}")return Nonedef save_notes(self, filename):with open(filename, 'w', encoding='utf-8') as f:json.dump(self.notes, f, ensure_ascii=False, indent=2)# 使用示例if __name__ == "__main__":app = VoiceNote()while True:input("按回车键记录语音笔记,或输入exit退出...")if input().lower() == 'exit':breakapp.record_note()app.save_notes("voice_notes.json")
通过系统掌握SpeechRecognition库的使用方法,开发者可以快速构建从简单语音助手到复杂语音交互系统的各类应用。实际开发中建议先通过离线引擎(Sphinx)验证基础功能,再根据需求选择合适的在线服务。