当前位置：首页 > news >正文

Qwen3-ASR-0.6B实战教程：法律庭审录音转写+关键段落高亮标注案例

news 2026/3/26 21:29:55

Qwen3-ASR-0.6B实战教程：法律庭审录音转写+关键段落高亮标注案例

法律庭审录音转写痛点：传统人工转写耗时耗力，1小时录音需要3-4小时人工整理，且容易遗漏关键信息。本文将展示如何用Qwen3-ASR-0.6B实现高效准确的庭审录音转写，并自动标注关键段落。

1. 环境准备与快速部署

1.1 系统要求与安装

Qwen3-ASR-0.6B对硬件要求相对友好，适合大多数开发环境：

Python版本：3.8或更高
GPU内存：至少4GB（CPU也可运行但速度较慢）
系统内存：建议8GB以上

安装核心依赖包：

pip install transformers>=4.40.0 pip install torch>=2.0.0 pip install gradio>=4.0.0 pip install soundfile librosa # 音频处理相关

1.2 一键部署代码

创建简单的部署脚本，5分钟即可启动语音识别服务：

from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor import torch # 加载模型和处理器 model_id = "Qwen/Qwen3-ASR-0.6B" device = "cuda" if torch.cuda.is_available() else "cpu" model = AutoModelForSpeechSeq2Seq.from_pretrained( model_id, torch_dtype=torch.float16, low_cpu_mem_usage=True, use_safetensors=True ) processor = AutoProcessor.from_pretrained(model_id) model.to(device) print("✅ 模型加载完成，准备就绪！")

2. 基础功能快速上手

2.1 语音识别核心代码

让我们先实现一个最简单的语音识别函数：

import torch import librosa from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor def transcribe_audio(audio_path): """ 语音转文字核心函数 audio_path: 音频文件路径 """ # 加载音频文件 audio_input, sample_rate = librosa.load(audio_path, sr=16000) # 处理音频输入 inputs = processor( audio_input, sampling_rate=sample_rate, return_tensors="pt", padding=True ) # 移动到GPU（如果可用） inputs = {k: v.to(device) for k, v in inputs.items()} # 生成转录结果 with torch.no_grad(): generated_ids = model.generate(**inputs) # 解码结果 transcription = processor.batch_decode( generated_ids, skip_special_tokens=True )[0] return transcription # 使用示例 result = transcribe_audio("test_audio.wav") print(f"识别结果：{result}")

2.2 测试你的第一个语音识别

准备一个简短的测试音频（3-5秒），运行上面的代码。你会看到类似这样的输出：

识别结果：本案原告主张被告应支付货款人民币五十万元整

3. 法律庭审录音转写实战

3.1 庭审录音特点与处理技巧

法律庭审录音有其独特特点，需要特别注意：

多人对话：法官、原告、被告、证人交替发言
专业术语：大量法律专业词汇和表达
音频质量：可能存在背景噪音、声音重叠等问题

针对这些特点，我们优化识别代码：

def transcribe_court_audio(audio_path, language="zh"): """ 专门处理庭审录音的函数 """ # 加载并预处理音频 audio, sr = librosa.load(audio_path, sr=16000) # 增强音频处理（可选） audio = librosa.effects.preemphasis(audio) # 使用更适合法律场景的参数 inputs = processor( audio, sampling_rate=sr, return_tensors="pt", padding=True, language=language ) # 生成参数调整 generate_kwargs = { "max_length": 448, # 适当增加最大长度 "num_beams": 5, # 使用束搜索提高准确性 } inputs = {k: v.to(device) for k, v in inputs.items()} with torch.no_grad(): generated_ids = model.generate(**inputs, **generate_kwargs) transcription = processor.batch_decode( generated_ids, skip_special_tokens=True )[0] return transcription

3.2 批量处理庭审录音

实际工作中往往需要处理多个音频文件：

import os from pathlib import Path def batch_transcribe_court_audios(audio_dir, output_dir="transcriptions"): """ 批量转写庭审录音 """ os.makedirs(output_dir, exist_ok=True) audio_files = list(Path(audio_dir).glob("*.wav")) results = [] for audio_file in audio_files: print(f"处理文件：{audio_file.name}") try: transcription = transcribe_court_audio(str(audio_file)) # 保存结果 output_file = Path(output_dir) / f"{audio_file.stem}.txt" with open(output_file, 'w', encoding='utf-8') as f: f.write(transcription) results.append({ "file": audio_file.name, "transcription": transcription, "status": "success" }) except Exception as e: results.append({ "file": audio_file.name, "error": str(e), "status": "failed" }) return results # 批量处理示例 results = batch_transcribe_court_audios("court_recordings/")

4. 关键段落高亮标注功能

4.1 法律关键词自动识别

在法律文档中，某些关键词和段落特别重要：

LEGAL_KEYWORDS = [ "证据", "证言", "质证", "辩论", "判决", "裁定", "上诉", "抗诉", "和解", "调解", "违约", "侵权", "赔偿", "违约金", "利息", "诉讼费", "律师费", "事实认定", "法律适用", "争议焦点" ] def highlight_legal_keywords(text, keywords=LEGAL_KEYWORDS): """ 高亮标注法律关键词 """ highlighted_text = text found_keywords = [] for keyword in keywords: if keyword in text: found_keywords.append(keyword) # 在关键词前后添加标记 highlighted_text = highlighted_text.replace( keyword, f"**[{keyword}]**" # 用加粗标记高亮 ) return highlighted_text, found_keywords # 使用示例 transcription = "原告提供的证据包括合同书和银行转账记录，被告对证据真实性提出质证" highlighted, keywords = highlight_legal_keywords(transcription) print(highlighted)

输出结果：

原告提供的**[证据]**包括合同书和银行转账记录，被告对**[证据]**真实性提出**[质证]**

4.2 时间戳与关键段落定位

对于长音频，时间戳功能特别重要：

def transcribe_with_timestamps(audio_path): """ 带时间戳的转录（模拟实现） 实际使用时需要结合Qwen3-ForcedAligner-0.6B """ # 这里简化实现，实际应使用强制对齐模型 transcription = transcribe_court_audio(audio_path) # 模拟时间戳生成（实际应基于音频分析） words = transcription.split() timestamped_segments = [] # 假设每词0.5秒（简化计算） current_time = 0.0 for i, word in enumerate(words): segment = { "text": word, "start": current_time, "end": current_time + 0.5, "is_keyword": word in LEGAL_KEYWORDS } timestamped_segments.append(segment) current_time += 0.5 return timestamped_segments def find_key_segments(timestamped_segments, min_duration=2.0): """ 找出包含关键词的关键段落 """ key_segments = [] current_segment = None for segment in timestamped_segments: if segment["is_keyword"]: if current_segment is None: current_segment = { "start": segment["start"], "end": segment["end"], "keywords": [segment["text"]], "text": segment["text"] } else: current_segment["end"] = segment["end"] current_segment["keywords"].append(segment["text"]) current_segment["text"] += " " + segment["text"] else: if current_segment is not None: # 检查段落持续时间 duration = current_segment["end"] - current_segment["start"] if duration >= min_duration: key_segments.append(current_segment) current_segment = None return key_segments

5. Gradio前端界面开发

5.1 构建完整的Web应用

创建一个用户友好的界面，让非技术人员也能使用：

import gradio as gr import tempfile import os def process_audio(audio_file, highlight_keywords=True): """ 处理上传的音频文件 """ if audio_file is None: return "请上传音频文件", "" try: # 转录音频 transcription = transcribe_court_audio(audio_file) # 高亮关键词 if highlight_keywords: highlighted, keywords = highlight_legal_keywords(transcription) keywords_str = ", ".join(keywords) if keywords else "无关键词" else: highlighted = transcription keywords_str = "关键词高亮已禁用" return highlighted, keywords_str except Exception as e: return f"处理失败：{str(e)}", "" # 创建Gradio界面 with gr.Blocks(title="法律庭审录音转写系统") as demo: gr.Markdown("# 🎯 法律庭审录音智能转写系统") gr.Markdown("上传庭审录音文件，自动转写文字并标注关键法律段落") with gr.Row(): with gr.Column(): audio_input = gr.Audio( label="上传庭审录音", type="filepath", sources=["upload", "microphone"] ) highlight_checkbox = gr.Checkbox( label="高亮法律关键词", value=True ) process_btn = gr.Button("开始转写", variant="primary") with gr.Column(): output_text = gr.Textbox( label="转写结果", lines=10, placeholder="转写结果将显示在这里..." ) keywords_output = gr.Textbox( label="检测到的关键词", placeholder="检测到的法律关键词将显示在这里..." ) # 示例部分 gr.Examples( examples=[ ["example1.wav", "证据质证辩论示例"], ["example2.wav", "违约赔偿争议示例"] ], inputs=[audio_input, gr.Textbox(visible=False)], label="试试示例音频" ) process_btn.click( fn=process_audio, inputs=[audio_input, highlight_checkbox], outputs=[output_text, keywords_output] ) # 启动应用 if __name__ == "__main__": demo.launch(server_name="0.0.0.0", server_port=7860)

5.2 界面功能说明

这个Web界面提供以下功能：

多种音频输入方式：支持上传文件或直接录音
实时转写：点击按钮开始处理音频
关键词高亮：自动标注重要法律术语
示例音频：提供测试用的示例文件
响应式设计：适应不同设备屏幕大小

启动后访问http://localhost:7860即可使用。

6. 实战案例：完整庭审录音处理

6.1 实际应用场景演示

假设我们有一个30分钟的庭审录音，包含以下典型内容：

法官：宣布开庭、引导程序
原告：陈述事实、提供证据
被告：质证、辩论
证人：作证陈述

处理流程：

# 处理长音频文件（示例） long_audio_path = "court_session_30min.wav" print("开始处理30分钟庭审录音...") transcription = transcribe_court_audio(long_audio_path) print("标注关键法律段落...") highlighted_text, keywords = highlight_legal_keywords(transcription) print("生成时间戳信息...") timestamped_segments = transcribe_with_timestamps(long_audio_path) key_segments = find_key_segments(timestamped_segments) print(f"转写完成！共检测到 {len(keywords)} 个关键词") print(f"识别出 {len(key_segments)} 个关键段落") # 保存完整结果 with open("court_transcription_full.md", "w", encoding="utf-8") as f: f.write("# 庭审录音转写报告\n\n") f.write(f"**音频文件**：{long_audio_path}\n") f.write(f"**处理时间**：{datetime.now().strftime('%Y-%m-%d %H:%M')}\n") f.write(f"**检测关键词**：{', '.join(keywords)}\n\n") f.write("## 完整转写内容\n") f.write(highlighted_text + "\n\n") f.write("## 关键段落摘要\n") for i, segment in enumerate(key_segments, 1): f.write(f"{i}. [{segment['start']:.1f}s-{segment['end']:.1f}s] ") f.write(f"{segment['text']} ") f.write(f"(关键词：{', '.join(segment['keywords'])})\n")