简介:本文详细介绍基于System.Speech库的C#语音转文字实现方法,涵盖环境配置、核心代码实现、异常处理及性能优化策略,适合需要本地化语音识别功能的开发者。
System.Speech是微软.NET Framework自带的语音识别与合成API,属于System.Speech.Recognition命名空间。该库自.NET Framework 3.0起集成,无需额外安装,支持离线语音识别,特别适合对隐私要求高或网络环境受限的场景。其核心优势在于:
using System.Speech.Recognition;using System.Speech.AudioFormat;
// 创建中文识别引擎(可替换为其他语言)SpeechRecognitionEngine recognizer = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("zh-CN"));
// 创建简单单词集合Choices commands = new Choices();commands.Add(new string[] { "开始", "停止", "保存", "退出" });// 创建语法构建器GrammarBuilder grammarBuilder = new GrammarBuilder();grammarBuilder.Append(commands);// 创建语法对象Grammar commandsGrammar = new Grammar(grammarBuilder);recognizer.LoadGrammar(commandsGrammar);
// 使用SRGS语法文件(XML格式)Grammar srgsGrammar = new Grammar(@"C:\grammars\command.xml");recognizer.LoadGrammar(srgsGrammar);
// 注册识别完成事件recognizer.SpeechRecognized += (sender, e) => {if (e.Result.Confidence > 0.7) // 置信度阈值{Console.WriteLine($"识别结果: {e.Result.Text} (置信度: {e.Result.Confidence:P0})");}};// 注册识别失败事件recognizer.SpeechHypothesized += (sender, e) => {Console.WriteLine($"假设结果: {e.Result.Text}");};// 注册错误处理事件recognizer.RecognizeCompleted += (sender, e) => {if (e.Error != null){Console.WriteLine($"识别错误: {e.Error.Message}");}};
try{// 设置输入设备(默认麦克风)recognizer.SetInputToDefaultAudioDevice();// 开始异步识别recognizer.RecognizeAsync(RecognizeMode.Multiple);Console.WriteLine("语音识别已启动,请说话...");Console.WriteLine("输入'退出'结束程序");// 主线程等待while (true){string input = Console.ReadLine();if (input.ToLower() == "退出")break;}// 停止识别recognizer.RecognizeAsyncStop();}catch (InvalidOperationException ex){Console.WriteLine($"初始化错误: {ex.Message}");}finally{recognizer.Dispose();}
// 在SpeechRecognized事件中添加if (e.Result.Confidence < 0.6){// 低置信度结果处理return;}
// 设置最优采样率(16kHz 16bit单声道)recognizer.SetInputToAudioStream(audioStream,new SpeechAudioFormatInfo(16000, AudioBitsPerSample.Sixteen, AudioChannel.Mono));
// 使用Task.Run处理识别结果recognizer.SpeechRecognized += (sender, e) => {Task.Run(() => {// 耗时操作(如数据库写入)});};
解决方案:
// 添加用户特定语音样本
adapter.LoadGrammar(new DictationGrammar());
```
using (var tempRecognizer = new SpeechRecognitionEngine()){// 临时使用}
// 加载多种语法recognizer.LoadGrammar(new Grammar(new Choices("你好", "hello"), "zh-CN"));recognizer.LoadGrammar(new Grammar(new Choices("Hi", "Hello"), "en-US"));// 设置优先级recognizer.MaxAlternates = 3; // 返回最多3个候选结果
using System;using System.Speech.Recognition;using System.Threading.Tasks;class VoiceRecognitionDemo{static void Main(){using (var recognizer = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("zh-CN"))){try{// 配置语法var commands = new Choices(new[] { "开始", "停止", "保存", "退出" });var grammar = new Grammar(new GrammarBuilder(commands));recognizer.LoadGrammar(grammar);// 事件处理recognizer.SpeechRecognized += (s, e) => {if (e.Result.Confidence > 0.7){Console.WriteLine($"命令: {e.Result.Text}");if (e.Result.Text == "退出")Environment.Exit(0);}};recognizer.SetInputToDefaultAudioDevice();recognizer.RecognizeAsync(RecognizeMode.Multiple);Console.WriteLine("语音命令系统已启动(说'退出'结束)");Console.ReadLine(); // 保持程序运行}catch (Exception ex){Console.WriteLine($"错误: {ex.Message}");}}}}
| 方案 | 优点 | 缺点 | 适用场景 |
|---|---|---|---|
| System.Speech | 零依赖、离线可用 | 功能有限、中文识别率约85% | 内网环境、隐私敏感项目 |
| Microsoft Speech SDK | 更高准确率 | 需要API密钥 | 可联网的商业应用 |
| CMUSphinx | 开源跨平台 | 配置复杂 | Linux/嵌入式系统 |
本文提供的System.Speech实现方案特别适合需要快速部署且对网络依赖敏感的场景。对于更高要求的商业应用,建议后续探索方法二(基于Azure Speech SDK的云端方案)。