5步实现Fun-ASR流式语音识别:前端录音+后端实时转写完整方案
5步实现Fun-ASR流式语音识别:前端录音+后端实时转写完整方案
1. 项目概述与环境准备
Fun-ASR-MLT-Nano-2512是阿里通义实验室推出的轻量级多语言语音识别模型,支持31种语言的高精度识别。本文将带您从零构建一个完整的流式语音识别系统,实现"边说边转文字"的实时效果。
1.1 核心功能特点
- 多语言支持:覆盖中文、英文、日语、韩语等31种语言
- 轻量化设计:仅800M参数,消费级GPU即可流畅运行
- 实时流式处理:支持音频片段持续输入,增量返回识别结果
- 上下文保持:自动维护对话状态,确保长语音连贯识别
1.2 基础环境配置
# 安装系统依赖 sudo apt-get update && sudo apt-get install -y ffmpeg # 创建Python虚拟环境 python -m venv asr_env source asr_env/bin/activate # 安装Python依赖 pip install torch torchaudio websockets python-multipart2. 模型部署与核心API解析
2.1 模型快速加载
创建model_loader.py实现单例模式加载,避免重复初始化:
from funasr import AutoModel _model_instance = None def get_model(device="cuda:0"): global _model_instance if _model_instance is None: print("正在加载Fun-ASR模型...") _model_instance = AutoModel( model="Fun-ASR-MLT-Nano-2512", trust_remote_code=True, device=device ) return _model_instance2.2 流式处理关键参数
模型通过cache参数实现状态保持:
# 典型流式调用示例 result = model.generate( input=audio_chunk, # 当前音频片段 cache=previous_cache, # 前次识别的状态 batch_size=1, language="中文", itn=True # 启用文本规整化 ) current_cache = result[0]["cache"] # 保存供下次使用3. WebSocket服务端实现
3.1 服务端核心架构
asr_server/ ├── __init__.py ├── server.py # WebSocket主服务 └── processor.py # 流式处理逻辑3.2 流式处理器实现
processor.py关键代码:
import numpy as np import torch class StreamProcessor: def __init__(self): self.buffer = np.array([], dtype=np.float32) self.sample_rate = 16000 def add_audio(self, pcm_data): """添加16-bit PCM音频数据""" audio = np.frombuffer(pcm_data, dtype=np.int16) audio = audio.astype(np.float32) / 32768.0 self.buffer = np.concatenate([self.buffer, audio]) def process(self, language="中文"): """执行流式识别""" if len(self.buffer) < self.sample_rate * 0.2: # 至少200ms音频 return {"text": "", "final": False} waveform = torch.from_numpy(self.buffer).unsqueeze(0) result = model.generate( input=waveform, cache=getattr(self, "cache", {}), language=language ) if result: self.cache = result[0].get("cache", {}) return { "text": result[0]["text"], "final": False } return {"text": "", "final": False}3.3 WebSocket服务主循环
server.py核心逻辑:
import asyncio import websockets import json from processor import StreamProcessor async def handle_client(websocket): processor = StreamProcessor() async for message in websocket: data = json.loads(message) if data["type"] == "audio": processor.add_audio(data["data"]) result = processor.process(data.get("language", "中文")) await websocket.send(json.dumps(result)) elif data["type"] == "reset": processor = StreamProcessor() async def main(): async with websockets.serve(handle_client, "0.0.0.0", 8765): print("ASR服务已启动 ws://localhost:8765") await asyncio.Future() if __name__ == "__main__": asyncio.run(main())4. 前端录音与实时交互
4.1 网页录音核心代码
<script> let mediaRecorder, socket; async function startRecording() { const stream = await navigator.mediaDevices.getUserMedia({ audio: true }); mediaRecorder = new MediaRecorder(stream, { mimeType: 'audio/webm;codecs=opus', audioBitsPerSecond: 16000 }); socket = new WebSocket('ws://localhost:8765'); mediaRecorder.ondataavailable = async (e) => { const audioData = await e.data.arrayBuffer(); if (socket.readyState === WebSocket.OPEN) { socket.send(JSON.stringify({ type: "audio", data: Array.from(new Uint8Array(audioData)), language: document.getElementById('lang').value })); } }; mediaRecorder.start(200); // 每200ms触发一次dataavailable } function stopRecording() { mediaRecorder.stop(); socket.close(); } </script>4.2 实时结果显示实现
socket.onmessage = (event) => { const result = JSON.parse(event.data); const outputDiv = document.getElementById('output'); outputDiv.textContent += result.text; // 自动滚动到底部 outputDiv.scrollTop = outputDiv.scrollHeight; };5. 生产环境部署优化
5.1 Docker容器化配置
FROM python:3.11-slim WORKDIR /app COPY . . RUN apt-get update && apt-get install -y ffmpeg && \ pip install -r requirements.txt EXPOSE 8765 CMD ["python", "asr_server/server.py"]5.2 系统服务管理
创建systemd服务单元/etc/systemd/system/funasr.service:
[Unit] Description=Fun-ASR Streaming Service After=network.target [Service] User=asruser WorkingDirectory=/opt/funasr ExecStart=/opt/funasr/venv/bin/python -m asr_server.server Restart=always [Install] WantedBy=multi-user.target5.3 性能监控与调优
# GPU使用监控 watch -n 1 nvidia-smi # 服务日志跟踪 journalctl -u funasr -f # 网络连接检查 ss -tulnp | grep 8765获取更多AI镜像
想探索更多AI镜像和应用场景?访问 CSDN星图镜像广场,提供丰富的预置镜像,覆盖大模型推理、图像生成、视频生成、模型微调等多个领域,支持一键部署。
