简介:本文详细介绍如何在C#项目中调用百度语音识别API,涵盖环境准备、API密钥获取、核心代码实现及异常处理,帮助开发者快速实现语音转文字功能。
百度语音识别API作为国内领先的语音识别服务,支持实时流式识别与异步文件识别两种模式,覆盖中文、英文及方言识别场景。在C#开发中,该技术可应用于智能客服、会议纪要生成、语音导航等场景。相较于本地部署方案,API调用方式具有维护成本低、识别准确率高的优势,尤其适合中小企业快速实现语音处理能力。
通过NuGet安装必要组件:
Install-Package Newtonsoft.Json // JSON数据处理Install-Package System.Net.Http // HTTP请求支持
API Key和Secret KeyAppID(部分接口需要)安全建议:将密钥存储在环境变量或配置文件中,避免硬编码在源代码里。
public async Task<string> GetAccessToken(){string authUrl = $"https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id={API_KEY}&client_secret={SECRET_KEY}";using (HttpClient client = new HttpClient()){HttpResponseMessage response = await client.GetAsync(authUrl);string result = await response.Content.ReadAsStringAsync();dynamic json = JsonConvert.DeserializeObject(result);return json.access_token.ToString();}}
关键点:令牌有效期为30天,建议实现自动刷新机制。
public async Task<string> RecognizeAudio(string filePath, string accessToken){// 1. 读取音频文件(支持wav/pcm格式)byte[] audioData = File.ReadAllBytes(filePath);// 2. 构造请求参数string apiUrl = $"https://vop.baidu.com/server_api?cuid=YOUR_DEVICE_ID&token={accessToken}";using (var content = new MultipartFormDataContent{{ new ByteArrayContent(audioData), "audio", "audio.wav" },{ new StringContent("1537"), "format" }, // 采样率16k{ new StringContent("1"), "rate" }, // 16k采样率{ new StringContent("wav"), "channel" }, // 单声道{ new StringContent("16"), "bit" } // 16bit位深}){using (HttpClient client = new HttpClient()){var response = await client.PostAsync(apiUrl, content);var result = await response.Content.ReadAsStringAsync();dynamic json = JsonConvert.DeserializeObject(result);// 处理识别结果if (json.err_no == 0){return json.result[0].ToString();}throw new Exception($"识别失败: {json.err_msg}");}}}
参数说明:
format:音频格式(wav/pcm/amr等)rate:采样率(8000/16000)channel:声道数(1/2)
public async Task<string> StreamRecognize(Stream audioStream, string accessToken){string wsUrl = $"wss://vop.baidu.com/ws_api?token={accessToken}";// 实现WebSocket连接(需使用WebSocketSharp等库)using (var ws = new ClientWebSocket()){await ws.ConnectAsync(new Uri(wsUrl), CancellationToken.None);// 发送配置信息string config = JsonConvert.SerializeObject(new{format = "wav",rate = 16000,channel = 1,token = accessToken});await ws.SendAsync(new ArraySegment<byte>(Encoding.UTF8.GetBytes(config)),WebSocketMessageType.Text, true, CancellationToken.None);// 分段发送音频数据byte[] buffer = new byte[1024];int bytesRead;while ((bytesRead = audioStream.Read(buffer, 0, buffer.Length)) > 0){await ws.SendAsync(new ArraySegment<byte>(buffer, 0, bytesRead),WebSocketMessageType.Binary, true, CancellationToken.None);}// 接收识别结果(需实现消息解析逻辑)// ...}}
流式处理要点:
| 错误码 | 原因 | 解决方案 |
|---|---|---|
| 100 | 无效token | 检查密钥有效性 |
| 110 | 访问频率超限 | 实现指数退避重试 |
| 111 | 服务器内部错误 | 检查音频格式是否符合要求 |
| 1405 | 音频过长 | 分段处理(单次不超过60s) |
Task.Run将耗时操作放入后台线程
public class BaiduASRService{private readonly string API_KEY;private readonly string SECRET_KEY;public BaiduASRService(string apiKey, string secretKey){API_KEY = apiKey;SECRET_KEY = secretKey;}public async Task<string> RecognizeFileAsync(string filePath){try{string token = await GetAccessToken();return await RecognizeAudio(filePath, token);}catch (Exception ex){// 实现日志记录Console.WriteLine($"识别异常: {ex.Message}");throw;}}// 前文方法实现...}// 调用示例var service = new BaiduASRService("your_api_key", "your_secret_key");var result = await service.RecognizeFileAsync("test.wav");Console.WriteLine(result);
public async Task<List<string>> RecognizeLongAudio(string filePath){var results = new List<string>();byte[] fullAudio = File.ReadAllBytes(filePath);// 按60秒分段(假设采样率16k,16bit单声道)int segmentSize = 16000 * 60 * 2; // 字节数计算for (int i = 0; i < fullAudio.Length; i += segmentSize){int length = Math.Min(segmentSize, fullAudio.Length - i);byte[] segment = new byte[length];Array.Copy(fullAudio, i, segment, 0, length);using (var ms = new MemoryStream(segment)){string token = await GetAccessToken();results.Add(await RecognizeAudio(ms, token));}}return results;}
// 结合WPF实现实时显示public async Task StartRealTimeCaption(Stream audioStream){string token = await GetAccessToken();var ws = new ClientWebSocket();await ws.ConnectAsync(new Uri($"wss://vop.baidu.com/ws_api?token={token}"), CancellationToken.None);// 发送配置...// 创建显示窗口var captionWindow = new CaptionWindow();captionWindow.Show();// 处理消息回调_ = Task.Run(async () =>{var buffer = new byte[4096];while (ws.State == WebSocketState.Open){var segment = new ArraySegment<byte>(buffer);var result = await ws.ReceiveAsync(segment, CancellationToken.None);if (result.MessageType == WebSocketMessageType.Text){string message = Encoding.UTF8.GetString(buffer, 0, result.Count);dynamic json = JsonConvert.DeserializeObject(message);if (json.result_type == "partial_result"){captionWindow.UpdateCaption(json.result.ToString());}}}});// 发送音频数据...}
| 测试场景 | 输入数据 | 预期结果 |
|---|---|---|
| 标准普通话 | 16k wav文件 | 识别准确率>95% |
| 带背景噪音 | 咖啡厅录音 | 识别准确率>85% |
| 专业术语 | 医疗领域录音 | 关键术语正确识别 |
| 长语音 | 10分钟录音 | 分段处理完整 |
通过本文提供的完整实现方案,开发者可以在C#环境中快速构建稳定的语音识别功能。实际开发中,建议结合具体业务场景进行参数调优和异常处理增强。