当前位置：首页 > news >正文

Qwen3-ASR-1.7B开源模型教程：Python调用API实现批量音频转文本

news 2026/4/25 5:11:23

Qwen3-ASR-1.7B开源模型教程：Python调用API实现批量音频转文本

1. 快速了解Qwen3-ASR-1.7B语音识别模型

Qwen3-ASR-1.7B是阿里云通义千问团队推出的开源语音识别模型，属于ASR系列中的高精度版本。这个模型最大的特点就是识别准确率很高，而且支持的语言种类特别丰富。

简单来说，这个模型能帮你把各种音频文件里的说话内容自动转换成文字。无论是会议录音、采访记录、还是语音笔记，都能快速变成可编辑的文本内容。

核心优势：

支持52种语言和方言，包括30种主要语言和22种中文方言
1.7B参数规模，识别精度比小版本更高
自动检测语言类型，不需要手动指定
在嘈杂环境下也能保持不错的识别效果

2. 环境准备与API基础

2.1 安装必要的Python库

在开始之前，我们需要安装几个必要的Python库。打开命令行工具，执行以下命令：

pip install requests soundfile numpy

这些库的作用分别是：

requests：用于发送HTTP请求到语音识别API
soundfile：用于读取和处理音频文件
numpy：处理音频数据时需要的数值计算库

2.2 准备测试音频文件

为了演示效果，我们先准备一些测试用的音频文件。你可以使用自己录制的音频，或者从网上下载一些示例音频。支持的格式包括：

WAV格式（推荐，效果最好）
MP3格式（最常见的音频格式）
FLAC格式（无损压缩格式）
OGG格式（开源音频格式）

建议把要处理的音频文件都放在同一个文件夹里，这样方便批量处理。

3. Python调用API实现单文件转写

我们先从最简单的单文件转写开始，这样你可以先体验一下整个流程。

3.1 基础转写代码示例

import requests import json import os def transcribe_audio(audio_path, api_url): """ 将单个音频文件转换为文本 参数: audio_path: 音频文件路径 api_url: API服务地址 返回: 识别结果文本 """ try: # 读取音频文件 with open(audio_path, 'rb') as audio_file: files = {'audio': audio_file} # 发送请求到语音识别API response = requests.post(api_url, files=files) if response.status_code == 200: result = response.json() return result.get('text', '识别失败') else: return f"请求失败，状态码: {response.status_code}" except Exception as e: return f"处理过程中出错: {str(e)}" # 使用示例 if __name__ == "__main__": # 替换为你的API地址 api_url = "https://gpu-你的实例ID-7860.web.gpu.csdn.net/transcribe" # 替换为你的音频文件路径 audio_file = "test_audio.wav" result = transcribe_audio(audio_file, api_url) print(f"识别结果: {result}")

3.2 处理返回结果

API通常返回JSON格式的数据，包含识别结果和相关信息：

# 假设API返回这样的数据结构 response_data = { "text": "这是识别出来的文字内容", "language": "zh", # 检测到的语言代码 "confidence": 0.95, # 识别置信度 "duration": 10.5 # 音频时长（秒） } # 你可以这样提取和使用这些信息 def process_result(result_json): text = result_json.get('text', '') language = result_json.get('language', 'unknown') confidence = result_json.get('confidence', 0) print(f"识别内容: {text}") print(f"检测语言: {language}") print(f"置信度: {confidence:.2f}") return text

4. 批量处理多个音频文件

现在我们来实现批量处理功能，这样可以一次性处理整个文件夹里的所有音频文件。

4.1 批量转写实现代码

import os import time from pathlib import Path def batch_transcribe(audio_folder, api_url, output_file="results.txt"): """ 批量处理文件夹中的所有音频文件 参数: audio_folder: 音频文件所在文件夹路径 api_url: API服务地址 output_file: 结果输出文件路径 """ # 支持的音频格式 supported_formats = ['.wav', '.mp3', '.flac', '.ogg'] # 获取所有支持的音频文件 audio_files = [] for format in supported_formats: audio_files.extend(Path(audio_folder).glob(f"*{format}")) print(f"找到 {len(audio_files)} 个音频文件待处理") # 打开输出文件 with open(output_file, 'w', encoding='utf-8') as f_out: f_out.write("文件名\t识别结果\t处理状态\n") for i, audio_path in enumerate(audio_files, 1): print(f"正在处理第 {i}/{len(audio_files)} 个文件: {audio_path.name}") try: # 调用转写函数 result = transcribe_audio(str(audio_path), api_url) # 写入结果 f_out.write(f"{audio_path.name}\t{result}\t成功\n") print(f"✓ 完成: {audio_path.name}") except Exception as e: error_msg = f"处理失败: {str(e)}" f_out.write(f"{audio_path.name}\t{error_msg}\t失败\n") print(f"✗ 失败: {audio_path.name} - {error_msg}") # 添加短暂延迟，避免请求过于频繁 time.sleep(0.5) print(f"批量处理完成！结果已保存到: {output_file}") # 使用示例 if __name__ == "__main__": audio_folder = "你的音频文件夹路径" api_url = "你的API地址" batch_transcribe(audio_folder, api_url)

4.2 添加进度显示和错误处理

为了让批量处理更加友好，我们可以添加进度条和更好的错误处理：

def improved_batch_transcribe(audio_folder, api_url, output_file="results.txt"): """ 改进版的批量处理函数，带有进度显示和错误处理 """ from tqdm import tqdm # 需要先安装: pip install tqdm audio_files = list(Path(audio_folder).glob('*.wav')) + \ list(Path(audio_folder).glob('*.mp3')) + \ list(Path(audio_folder).glob('*.flac')) results = [] errors = [] with tqdm(audio_files, desc="处理音频文件") as pbar: for audio_path in pbar: pbar.set_postfix(file=audio_path.name[:20] + "...") try: result = transcribe_audio(str(audio_path), api_url) results.append({ 'filename': audio_path.name, 'text': result, 'status': 'success' }) except Exception as e: errors.append({ 'filename': audio_path.name, 'error': str(e) }) time.sleep(0.3) # 控制请求频率 # 保存结果 with open(output_file, 'w', encoding='utf-8') as f: f.write("批量语音识别结果\n") f.write("=" * 50 + "\n\n") for result in results: f.write(f"文件: {result['filename']}\n") f.write(f"结果: {result['text']}\n") f.write("-" * 30 + "\n") print(f"处理完成！成功: {len(results)}, 失败: {len(errors)}") return results, errors

5. 高级功能与实用技巧

5.1 处理长音频文件

如果音频文件比较长，可以考虑分段处理：

def split_long_audio(audio_path, segment_duration=30): """ 将长音频分割成小段（高级功能，需要安装pydub） 需要先安装: pip install pydub """ from pydub import AudioSegment from pydub.utils import make_chunks audio = AudioSegment.from_file(audio_path) chunks = make_chunks(audio, segment_duration * 1000) # 毫秒 # 保存分段音频 output_dir = Path("temp_chunks") output_dir.mkdir(exist_ok=True) chunk_files = [] for i, chunk in enumerate(chunks): chunk_file = output_dir / f"chunk_{i:03d}.wav" chunk.export(str(chunk_file), format="wav") chunk_files.append(chunk_file) return chunk_files def transcribe_long_audio(audio_path, api_url): """ 处理长音频文件 """ print("检测到长音频，开始分段处理...") chunk_files = split_long_audio(audio_path) full_text = [] for chunk_file in chunk_files: result = transcribe_audio(str(chunk_file), api_url) full_text.append(result) # 清理临时文件 chunk_file.unlink() # 删除临时目录（如果为空） temp_dir = Path("temp_chunks") if temp_dir.exists() and not any(temp_dir.iterdir()): temp_dir.rmdir() return " ".join(full_text)

5.2 添加文件格式检查

确保只处理支持的音频格式：

def is_supported_audio(file_path): """ 检查文件是否为支持的音频格式 """ supported_extensions = {'.wav', '.mp3', '.flac', '.ogg', '.m4a', '.aac'} return Path(file_path).suffix.lower() in supported_extensions def get_audio_duration(file_path): """ 获取音频文件时长（需要安装pydub） """ try: from pydub import AudioSegment audio = AudioSegment.from_file(file_path) return len(audio) / 1000 # 转换为秒 except: return None

6. 完整实战示例

下面是一个完整的示例，展示了如何在实际项目中使用这些功能：

import os import time import json from pathlib import Path import requests class AudioTranscriber: def __init__(self, api_url): self.api_url = api_url self.results = [] def transcribe_single(self, audio_path): """转写单个音频文件""" if not os.path.exists(audio_path): return f"文件不存在: {audio_path}" try: with open(audio_path, 'rb') as f: files = {'audio': f} response = requests.post(self.api_url, files=files) if response.status_code == 200: result = response.json() return result.get('text', '识别失败') else: return f"API请求失败: {response.status_code}" except Exception as e: return f"处理错误: {str(e)}" def transcribe_batch(self, input_folder, output_file="transcription_results.json"): """批量转写文件夹中的音频文件""" input_path = Path(input_folder) audio_files = [] # 收集所有支持的音频文件 for ext in ['*.wav', '*.mp3', '*.flac', '*.ogg']: audio_files.extend(input_path.glob(ext)) print(f"开始处理 {len(audio_files)} 个音频文件...") results = [] for i, audio_file in enumerate(audio_files, 1): print(f"[{i}/{len(audio_files)}] 处理: {audio_file.name}") start_time = time.time() text = self.transcribe_single(str(audio_file)) processing_time = time.time() - start_time result = { 'filename': audio_file.name, 'text': text, 'processing_time': round(processing_time, 2), 'timestamp': time.strftime('%Y-%m-%d %H:%M:%S') } results.append(result) time.sleep(0.5) # 避免请求过于频繁 # 保存结果 with open(output_file, 'w', encoding='utf-8') as f: json.dump(results, f, ensure_ascii=False, indent=2) print(f"处理完成！结果已保存到: {output_file}") return results # 使用示例 if __name__ == "__main__": # 初始化转录器 transcriber = AudioTranscriber( api_url="https://gpu-你的实例ID-7860.web.gpu.csdn.net/transcribe" ) # 批量处理音频文件 results = transcriber.transcribe_batch( input_folder="你的音频文件夹路径", output_file="识别结果.json" ) # 打印摘要信息 success_count = sum(1 for r in results if not r['text'].startswith(('API请求失败', '处理错误'))) print(f"\n处理摘要:") print(f"总文件数: {len(results)}") print(f"成功数: {success_count}") print(f"失败数: {len(results) - success_count}")