简介:本文详细介绍Unity如何通过集成Vosk实现离线语音识别功能,涵盖环境配置、模型加载、音频处理、识别逻辑及性能优化等核心环节,提供完整代码示例与实用建议。
在Unity应用开发中,语音交互功能常因依赖云端API导致延迟高、隐私风险及离线不可用等问题。Vosk作为开源离线语音识别工具包,支持多语言模型且无需网络连接,成为Unity开发者实现本地语音识别的理想选择。其核心优势包括:
典型应用场景涵盖教育软件、工业控制面板、车载系统等需要稳定语音交互的领域。某工业AR维护系统通过集成Vosk,将设备故障语音报修的响应速度从3秒提升至实时,同时避免了生产数据外泄风险。
下载Vosk库
从Vosk官网获取对应平台的库文件:
vosk.dlllibvosk.dyliblibvosk.soVosk.frameworkUnity插件配置
创建Plugins文件夹并按平台存放库文件:
Assets/├── Plugins/│ ├── x86_64/vosk.dll # Windows 64位│ ├── x86/vosk.dll # Windows 32位│ ├── Android/libvosk.so # Android│ └── iOS/Vosk.framework # iOS
模型文件准备
下载中文模型(约1.8GB)或精简版(500MB):
wget https://github.com/alphacep/vosk-models/releases/download/v0.15/vosk-model-small-cn-0.15.zipunzip vosk-model-small-cn-0.15.zip -d Assets/StreamingAssets/
将模型路径配置为Assets/StreamingAssets/vosk-model-small-cn-0.15
使用Unity的Microphone类实现实时音频捕获:
using UnityEngine;public class AudioCapture : MonoBehaviour{private AudioClip clip;private string deviceName;void Start(){if (Microphone.devices.Length > 0){deviceName = Microphone.devices[0];// 创建1024ms缓冲的音频片段clip = Microphone.Start(deviceName, true, 1, 44100);}}public float[] GetAudioData(){if (clip == null) return null;int pos = Microphone.GetPosition(deviceName);int sampleCount = clip.frequency * clip.channels;float[] samples = new float[sampleCount];clip.GetData(samples, 0);// 提取最后512ms的数据int startPos = Mathf.Max(0, pos - 512 * clip.frequency / 1000);float[] result = new float[512 * clip.frequency / 1000];System.Array.Copy(samples, startPos, result, 0, result.Length);return result;}}
通过P/Invoke调用Vosk原生接口:
using System;using System.Runtime.InteropServices;using System.Text;public class VoskRecognizer : IDisposable{// 导入Vosk C API[DllImport("vosk")]private static extern IntPtr vosk_recognizer_new(IntPtr model, float sampleRate);[DllImport("vosk")]private static extern int vosk_recognizer_accept_wave_form(IntPtr recognizer, float[] data, int length);[DllImport("vosk")]private static extern string vosk_recognizer_result(IntPtr recognizer);[DllImport("vosk")]private static extern void vosk_recognizer_free(IntPtr recognizer);[DllImport("vosk")]private static extern IntPtr vosk_model_new(string modelPath);[DllImport("vosk")]private static extern void vosk_model_free(IntPtr model);private IntPtr modelHandle;private IntPtr recognizerHandle;private bool disposed = false;public VoskRecognizer(string modelPath, float sampleRate = 44100f){modelHandle = vosk_model_new(modelPath);if (modelHandle == IntPtr.Zero)throw new Exception("Failed to load Vosk model");recognizerHandle = vosk_recognizer_new(modelHandle, sampleRate);}public string ProcessAudio(float[] audioData){int result = vosk_recognizer_accept_wave_form(recognizerHandle, audioData, audioData.Length);if (result == 0) return null;return vosk_recognizer_result(recognizerHandle);}public void Dispose(){if (!disposed){vosk_recognizer_free(recognizerHandle);vosk_model_free(modelHandle);disposed = true;}}}
using UnityEngine;public class SpeechRecognitionManager : MonoBehaviour{private AudioCapture audioCapture;private VoskRecognizer voskRecognizer;private string modelPath;void Start(){// 初始化模型路径(StreamingAssets需通过Application.streamingAssetsPath访问)modelPath = System.IO.Path.Combine(Application.streamingAssetsPath, "vosk-model-small-cn-0.15");audioCapture = GetComponent<AudioCapture>();voskRecognizer = new VoskRecognizer(modelPath);StartCoroutine(ContinuousRecognition());}private System.Collections.IEnumerator ContinuousRecognition(){while (true){float[] audioData = audioCapture.GetAudioData();if (audioData != null && audioData.Length > 0){string result = voskRecognizer.ProcessAudio(audioData);if (!string.IsNullOrEmpty(result)){Debug.Log("识别结果: " + result);// 处理识别结果...}}yield return new WaitForSeconds(0.1f);}}void OnDestroy(){voskRecognizer?.Dispose();}}
| 模型类型 | 大小 | 准确率 | 适用场景 |
|---|---|---|---|
| 全量模型 | 1.8GB | 92% | 高精度需求场景 |
| 小型模型 | 500MB | 85% | 移动端/嵌入式设备 |
| 微型模型 | 80MB | 78% | 资源极度受限环境 |
vosk_model_new返回空指针
// 调试代码示例if (!Directory.Exists(modelPath)){Debug.LogError($"模型路径不存在: {modelPath}");return;}
set_words模式获取中间结果VOSK_SAMPLE_RATE_16000降低计算量Microphone权限
<uses-permission android:name="android.permission.RECORD_AUDIO" />
通过自定义语法文件提升特定指令识别率:
// grammar.json 示例{"grammar": [["打开", ["灯", "空调", "窗帘"]],["设置温度", ["18度", "22度", "26度"]]]}
在C#中加载语法:
[DllImport("vosk")]private static extern void vosk_recognizer_set_json(IntPtr recognizer, string json);// 使用示例string grammarJson = File.ReadAllText(Path.Combine(Application.streamingAssetsPath, "grammar.json"));vosk_recognizer_set_json(recognizerHandle, grammarJson);
动态切换模型实现多语言识别:
public void SwitchLanguage(string newModelPath){voskRecognizer.Dispose();voskRecognizer = new VoskRecognizer(newModelPath);}
<游戏目录>/<游戏名称>_Data/StreamingAssetslibs/<ABI>目录Embedded Binaries| Unity版本 | Vosk API版本 | 适配说明 |
|---|---|---|
| 2020.3 | 0.3.45 | 需手动编译Android插件 |
| 2022.1 | 1.0.2 | 支持原生插件自动加载 |
通过Vosk实现Unity离线语音识别,开发者可获得:
未来发展方向包括:
完整项目示例已上传至GitHub:unity-vosk-demo,包含预编译插件、示例模型及详细文档。