当前位置：首页 > news >正文

SenseVoice-small-ONNX开源ASR教程：funasr-onnx框架下Python调用实例

news 2026/3/27 3:51:15

SenseVoice-small-ONNX开源ASR教程：funasr-onnx框架下Python调用实例

1. 项目概述

SenseVoice-small-ONNX是一个基于ONNX量化的多语言语音识别模型，支持中文、粤语、英语、日语、韩语等多种语言的语音转文字功能。这个模型经过量化处理后，体积仅为230M，但识别效果依然出色，特别适合在资源受限的环境中部署使用。

通过funasr-onnx框架，我们可以轻松地在Python环境中调用这个模型，实现高效的语音识别服务。无论是构建语音助手、会议记录系统，还是开发多语言翻译工具，这个方案都能提供稳定可靠的技术支持。

2. 环境准备与安装

在开始使用之前，我们需要先搭建好开发环境。以下是详细的安装步骤：

2.1 安装必要依赖

打开终端，执行以下命令安装所需的Python包：

pip install funasr-onnx gradio fastapi uvicorn soundfile jieba

这些依赖包的作用分别是：

funasr-onnx：核心语音识别框架
gradio：用于构建Web界面
fastapi和uvicorn：用于创建REST API服务
soundfile：处理音频文件
jieba：中文分词工具

2.2 模型准备

SenseVoice-small-ONNX模型会自动从缓存路径加载，无需手动下载。模型默认存储在：

/root/ai-models/danieldong/sensevoice-small-onnx-quant

如果这是第一次使用，系统会自动下载模型文件。量化后的模型文件名为model_quant.onnx，大小约为230MB。

3. 快速启动语音识别服务

3.1 启动Web服务

使用以下命令启动语音识别服务：

python3 app.py --host 0.0.0.0 --port 7860

服务启动后，可以通过以下地址访问：

Web界面：http://localhost:7860
API文档：http://localhost:7860/docs
健康检查：http://localhost:7860/health

3.2 Web界面使用

打开Web界面后，你会看到一个简洁的上传界面：

点击"上传音频"按钮选择音频文件
选择识别语言（支持自动检测）
点击"转写"按钮开始识别
查看识别结果和识别耗时

界面还会显示识别过程中的详细日志，包括语言检测结果、识别进度等信息。

4. Python直接调用示例

除了通过Web服务，我们还可以直接在Python代码中调用模型：

4.1 基础调用方法

from funasr_onnx import SenseVoiceSmall # 初始化模型 model = SenseVoiceSmall( "/root/ai-models/danieldong/sensevoice-small-onnx-quant", batch_size=10, quantize=True ) # 单文件识别 result = model(["audio.wav"], language="auto", use_itn=True) print(result[0])

4.2 批量处理示例

如果需要处理多个音频文件，可以使用批量处理功能：

import os # 获取所有音频文件 audio_files = [f for f in os.listdir("audio_dir") if f.endswith(('.wav', '.mp3'))] # 批量识别 results = model(audio_files, language="zh", use_itn=True) for i, result in enumerate(results): print(f"文件 {audio_files[i]} 的识别结果：") print(result) print("-" * 50)

4.3 高级参数配置

# 高级配置示例 model = SenseVoiceSmall( model_dir="/root/ai-models/danieldong/sensevoice-small-onnx-quant", batch_size=5, # 批处理大小 quantize=True, # 使用量化模型 device="cpu", # 使用CPU推理 num_threads=4 # 线程数 ) # 带详细参数的识别 result = model( ["meeting_recording.wav"], language="auto", # 自动检测语言 use_itn=True, # 启用逆文本正则化 hotword="技术术语 产品名称", # 添加热词提升特定词汇识别率 beam_size=10 # 搜索宽度 )

5. REST API接口调用

5.1 基本API调用

通过HTTP接口可以轻松集成到其他系统中：

curl -X POST "http://localhost:7860/api/transcribe" \ -F "file=@audio.wav" \ -F "language=auto" \ -F "use_itn=true"

5.2 Python代码调用API

import requests def transcribe_audio(file_path, language="auto"): url = "http://localhost:7860/api/transcribe" with open(file_path, 'rb') as f: files = {'file': f} data = {'language': language, 'use_itn': 'true'} response = requests.post(url, files=files, data=data) if response.status_code == 200: return response.json() else: return {"error": f"请求失败，状态码：{response.status_code}"} # 使用示例 result = transcribe_audio("test_audio.wav", "zh") print(result)

5.3 批量API处理

对于需要批量处理的情况，可以编写循环调用：

import glob import time def batch_transcribe(audio_dir, output_file): audio_files = glob.glob(f"{audio_dir}/*.wav") + glob.glob(f"{audio_dir}/*.mp3") with open(output_file, 'w', encoding='utf-8') as f: for audio_file in audio_files: print(f"处理文件：{audio_file}") try: result = transcribe_audio(audio_file) if 'text' in result: f.write(f"{audio_file}: {result['text']}\n") else: f.write(f"{audio_file}: 识别失败\n") # 避免请求过于频繁 time.sleep(0.1) except Exception as e: print(f"处理 {audio_file} 时出错：{e}") f.write(f"{audio_file}: 处理出错\n") # 批量处理目录中的所有音频文件 batch_transcribe("audio_recordings", "transcription_results.txt")

6. 实用技巧与最佳实践

6.1 音频预处理建议

为了获得更好的识别效果，建议对音频进行预处理：

import soundfile as sf import numpy as np def preprocess_audio(input_path, output_path): # 读取音频文件 data, samplerate = sf.read(input_path) # 转换为单声道（如果原本是立体声） if len(data.shape) > 1: data = np.mean(data, axis=1) # 标准化音频音量 data = data / np.max(np.abs(data)) * 0.9 # 保存处理后的音频 sf.write(output_path, data, samplerate) return output_path # 使用示例 processed_audio = preprocess_audio("raw_audio.wav", "processed_audio.wav") result = model([processed_audio], language="auto")

6.2 热词优化识别

对于特定领域的词汇，可以使用热词功能提升识别准确率：

# 医疗领域热词示例 medical_hotwords = "心电图 CT扫描 MRI 血压计 血糖仪 抗生素" result = model( ["medical_consultation.wav"], language="zh", hotword=medical_hotwords, use_itn=True ) # 技术领域热词示例 tech_hotwords = "Python JavaScript 人工智能 机器学习 深度学习 神经网络" result = model( ["tech_presentation.wav"], language="en", hotword=tech_hotwords )

6.3 性能优化建议

# 根据硬件配置调整参数 optimal_model = SenseVoiceSmall( model_dir="/root/ai-models/danieldong/sensevoice-small-onnx-quant", batch_size=8, # 根据内存大小调整 device="cpu", # 使用CPU num_threads=6, # 根据CPU核心数调整 quantize=True ) # 对于长音频，可以考虑先进行分割 def split_long_audio(audio_path, segment_duration=30): """将长音频分割为短片段""" data, samplerate = sf.read(audio_path) segment_length = segment_duration * samplerate segments = [] for i in range(0, len(data), segment_length): segment = data[i:i+segment_length] segment_path = f"segment_{i//segment_length}.wav" sf.write(segment_path, segment, samplerate) segments.append(segment_path) return segments