简介:本文详细介绍FunASR语音识别工具的部署与使用方法,涵盖环境配置、模型下载、API调用及实时转录实战,助力开发者快速实现高效语音转文字功能。
FunASR是由中科院自动化所推出的开源语音识别工具包,基于PyTorch框架开发,支持实时流式语音转录、长音频识别及多语言模型。其核心优势体现在三方面:
典型应用场景包括:
# 创建虚拟环境(推荐)conda create -n funasr python=3.8conda activate funasr# 安装核心依赖pip install torch==1.12.1+cu113 torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113pip install funasr==0.4.2# 可选:安装声学特征提取库pip install librosa==0.9.2
nvidia-smi确认GPU型号,安装对应版本的torchsudo chmod -R 777 /path/to/model解决模型目录权限pip check检测版本冲突,通过pip install --upgrade统一版本| 模型名称 | 适用场景 | 参数量 | 实时性 |
|---|---|---|---|
| paraformer | 工业级应用 | 1.2亿 | 80ms |
| conformer_ctc | 轻量级部署 | 3000万 | 50ms |
| multilingual | 多语言混合识别 | 2.1亿 | 120ms |
# 官方推荐方式(自动下载)from funasr import AutoModelmodel = AutoModel.from_pretrained("paraformer", cache_dir="./models")# 手动下载(适用于内网环境)wget https://model.funasr.com/paraformer/latest/paraformer-large.zipunzip paraformer-large.zip -d ./models
在config.json中调整关键参数:
{"decoder": {"beam_size": 10,"max_active": 30},"feature": {"sample_rate": 16000,"frame_length": 25,"frame_shift": 10}}
from funasr.runtime.online import OnlineASRasr = OnlineASR(model_dir="./models/paraformer",config_file="./config.json",device="cuda" # 或"cpu")# 模拟音频流输入(实际替换为麦克风或网络流)import numpy as npaudio_chunk = np.random.rand(1600).astype(np.float32) # 100ms@16kHzresult = asr.decode(audio_chunk)print(result["text"]) # 输出识别结果
from funasr.runtime.offline import OfflineASRasr = OfflineASR(model_dir="./models/paraformer",device="cuda")result = asr.decode_file("test.wav")print(result["text"]) # 完整转录文本print(result["timestamp"]) # 时间戳信息
# server.pyfrom fastapi import FastAPI, WebSocketfrom funasr.runtime.online import OnlineASRapp = FastAPI()asr = OnlineASR(model_dir="./models/paraformer")@app.websocket("/ws/asr")async def websocket_endpoint(websocket: WebSocket):await websocket.accept()while True:data = await websocket.receive_bytes()result = asr.decode(data)await websocket.send_text(result["text"])
启动服务:
uvicorn server:app --host 0.0.0.0 --port 8000
fp16=True)export USE_MKLDNN=1)batch_size为2的幂次方(如64/128)
from funasr.utils import model_quantization# 8位量化quantized_model = model_quantization(original_model="./models/paraformer",output_dir="./models/paraformer_quant")
| 参数 | 推荐值 | 影响效果 |
|---|---|---|
| chunk_size | 320ms | 增大降低实时性,提升准确率 |
| overlap_size | 80ms | 增大改善端点检测 |
| beam_size | 10 | 增大提升准确率,增加延迟 |
add_noise=True)
asr = OnlineASR(model_dir="./models/paraformer",lm_dir="./lm/zh.arpa")
chunk_hopping模式segments = vad_segment(“mixed.wav”, frame_size=320)
for seg in segments:
result = asr.decode(seg[“audio”])
# 七、进阶应用场景## 1. 实时字幕系统```javascript// 前端WebSocket连接示例const socket = new WebSocket("ws://asr-server:8000/ws/asr");socket.onmessage = (event) => {document.getElementById("subtitle").innerText = event.data;};// 音频流捕获(浏览器环境)const stream = await navigator.mediaDevices.getUserMedia({audio: true});const audioContext = new AudioContext();const source = audioContext.createMediaStreamSource(stream);
# 与Dialogflow集成示例from google.cloud import dialogflow_v2 as dialogflowdef detect_intent(text):session_client = dialogflow.SessionsClient()session = session_client.session_path("project-id", "session-id")text_input = dialogflow.TextInput(text=text, language_code="zh-CN")query_input = dialogflow.QueryInput(text=text_input)response = session_client.detect_intent(session=session, query_input=query_input)return response.query_result.fulfillment_text# 实时ASR与NLU联动while True:audio_data = get_audio_chunk()text = asr.decode(audio_data)["text"]reply = detect_intent(text)send_to_speaker(reply)
# 术语增强处理medical_terms = {"高血压": "hypertension","冠心病": "coronary heart disease"}def enhance_medical_text(text):for chinese, english in medical_terms.items():text = text.replace(chinese, f"{chinese}({english})")return text# 与DICOM系统集成from pydicom import dcmreaddef process_medical_audio(audio_path, dicom_path):dicom_data = dcmread(dicom_path)patient_id = dicom_data.PatientIDtext = asr.decode_file(audio_path)["text"]enhanced_text = enhance_medical_text(text)# 存储到医疗数据库save_to_database(patient_id, enhanced_text)
# Dockerfile示例FROM nvidia/cuda:11.3.1-base-ubuntu20.04RUN apt-get update && apt-get install -y \python3.8 \python3-pip \ffmpegWORKDIR /appCOPY requirements.txt .RUN pip install -r requirements.txtCOPY . .CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "8000"]
构建与运行:
docker build -t funasr-asr .docker run -d --gpus all -p 8000:8000 funasr-asr
server {
listen 80;
location /ws/asr {
proxy_pass http://asr_servers;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection “upgrade”;
}
}
## 3. 监控告警系统```python# Prometheus指标暴露from prometheus_client import start_http_server, CounterREQUEST_COUNT = Counter('asr_requests_total', 'Total ASR requests')ERROR_COUNT = Counter('asr_errors_total', 'Total ASR errors')@app.get("/metrics")def metrics():return Response(content=generate_latest(),media_type="text/plain")# 在ASR处理函数中添加REQUEST_COUNT.inc()try:result = asr.decode(audio)except Exception as e:ERROR_COUNT.inc()
通过本文的系统性介绍,开发者可以掌握FunASR从环境搭建到高级应用的全流程技术。实际部署时建议先在测试环境验证性能指标(推荐使用funasr-benchmark工具),再逐步扩展到生产环境。对于日均请求量超过10万次的场景,建议采用Kubernetes集群部署方案。