Qwen3-ASR-1.7B实战教程:对接企业微信/钉钉,实现会议语音自动归档
Qwen3-ASR-1.7B实战教程:对接企业微信/钉钉,实现会议语音自动归档
1. 教程概述
在现代企业办公环境中,会议录音的整理归档一直是个耗时费力的工作。传统的人工转录方式效率低下,且容易出错。本教程将带你使用Qwen3-ASR-1.7B语音识别系统,实现企业微信和钉钉会议录音的自动转录归档。
通过本教程,你将学会:
- 快速部署Qwen3-ASR-1.7B语音识别服务
- 配置企业微信和钉钉的录音文件自动获取
- 搭建完整的会议录音自动转录流水线
- 将识别结果自动归档到指定存储位置
这个方案特别适合需要频繁开会并需要记录会议纪要的团队,能够显著提升工作效率,确保重要会议内容不会遗漏。
2. 环境准备与快速部署
2.1 系统要求
首先确保你的服务器满足以下要求:
- 操作系统:Ubuntu 20.04或更高版本
- GPU:NVIDIA显卡,显存24GB以上(推荐RTX 4090或A100)
- 内存:32GB以上
- 存储:至少50GB可用空间
2.2 一键部署脚本
使用以下脚本快速部署Qwen3-ASR-1.7B服务:
#!/bin/bash # 创建项目目录 mkdir -p /opt/qwen-asr cd /opt/qwen-asr # 安装依赖 apt update apt install -y python3.9 python3.9-venv ffmpeg # 创建虚拟环境 python3.9 -m venv venv source venv/bin/activate # 安装PyTorch和依赖 pip install torch torchaudio --extra-index-url https://download.pytorch.org/whl/cu118 pip install transformers>=4.35.0 fastapi uvicorn python-multipart # 下载模型(需要提前获取模型访问权限) git clone https://huggingface.co/Qwen/Qwen3-ASR-1.7B model echo "部署完成!接下来配置服务..."2.3 启动识别服务
创建启动脚本start_service.py:
from fastapi import FastAPI, File, UploadFile from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor import torch import torchaudio import io app = FastAPI(title="Qwen3-ASR-1.7B服务") # 加载模型和处理器 model_path = "/opt/qwen-asr/model" device = "cuda" if torch.cuda.is_available() else "cpu" torch_dtype = torch.float16 if device == "cuda" else torch.float32 model = AutoModelForSpeechSeq2Seq.from_pretrained( model_path, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True ) model.to(device) processor = AutoProcessor.from_pretrained(model_path) @app.post("/transcribe") async def transcribe_audio(file: UploadFile = File(...)): # 读取音频文件 audio_data = await file.read() audio_input, sample_rate = torchaudio.load(io.BytesIO(audio_data)) # 预处理音频 inputs = processor( audio_input.squeeze().numpy(), sampling_rate=sample_rate, return_tensors="pt", padding=True ) # 转录 with torch.no_grad(): generated_ids = model.generate( inputs.input_values.to(device), attention_mask=inputs.attention_mask.to(device), max_length=448 ) transcription = processor.batch_decode( generated_ids, skip_special_tokens=True )[0] return {"text": transcription} if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8000)启动服务:
python start_service.py3. 企业微信对接配置
3.1 获取企业微信API权限
首先需要在企业微信管理后台配置API权限:
- 登录企业微信管理后台
- 进入「应用管理」→「自建应用」
- 创建新应用,获取以下信息:
- CorpID(企业ID)
- Secret(应用密钥)
- AgentId(应用ID)
3.2 会议录音自动拉取
创建企业微信录音处理脚本wechat_processor.py:
import requests import json import os from datetime import datetime class WeChatMeetingProcessor: def __init__(self, corp_id, corp_secret, agent_id): self.corp_id = corp_id self.corp_secret = corp_secret self.agent_id = agent_id self.access_token = self.get_access_token() def get_access_token(self): url = f"https://qyapi.weixin.qq.com/cgi-bin/gettoken?corpid={self.corp_id}&corpsecret={self.corp_secret}" response = requests.get(url) return response.json().get('access_token') def get_meeting_records(self, start_time, end_time): """获取指定时间段的会议记录""" url = f"https://qyapi.weixin.qq.com/cgi-bin/meeting/get_record?access_token={self.access_token}" data = { "start_time": start_time, "end_time": end_time, "limit": 100 } response = requests.post(url, json=data) return response.json().get('record', []) def download_record(self, record_id, save_path): """下载会议录音""" url = f"https://qyapi.weixin.qq.com/cgi-bin/meeting/record_download?access_token={self.access_token}" data = {"record_id": record_id} response = requests.post(url, json=data) if response.status_code == 200: with open(save_path, 'wb') as f: f.write(response.content) return True return False # 使用示例 wechat_processor = WeChatMeetingProcessor( corp_id="你的企业ID", corp_secret="你的应用密钥", agent_id="你的应用ID" ) # 获取今天的所有会议记录 today = datetime.now().strftime("%Y-%m-%d") records = wechat_processor.get_meeting_records( f"{today} 00:00:00", f"{today} 23:59:59" )4. 钉钉对接配置
4.1 配置钉钉开放平台
- 登录钉钉开放平台(https://open.dingtalk.com)
- 创建企业内部应用
- 获取以下凭证:
- AppKey
- AppSecret
- AgentId
4.2 钉钉会议录音处理
创建钉钉处理脚本dingtalk_processor.py:
import requests import json import time import hashlib import base64 class DingTalkMeetingProcessor: def __init__(self, app_key, app_secret, agent_id): self.app_key = app_key self.app_secret = app_secret self.agent_id = agent_id self.access_token = self.get_access_token() def get_access_token(self): url = "https://oapi.dingtalk.com/gettoken" params = { "appkey": self.app_key, "appsecret": self.app_secret } response = requests.get(url, params=params) return response.json().get('access_token') def get_meeting_list(self, start_time, end_time): """获取会议列表""" url = "https://oapi.dingtalk.com/topapi/meeting/list" headers = {"Content-Type": "application/json"} data = { "start_time": start_time, "end_time": end_time, "cursor": 0, "size": 100 } response = requests.post( url, json=data, headers=headers, params={"access_token": self.access_token} ) return response.json().get('result', {}).get('items', []) def download_meeting_record(self, meeting_id, save_path): """下载会议录音""" url = "https://oapi.dingtalk.com/topapi/meeting/record/get" data = {"meeting_id": meeting_id} response = requests.post( url, json=data, params={"access_token": self.access_token} ) record_url = response.json().get('result', {}).get('record_url') if record_url: audio_response = requests.get(record_url) with open(save_path, 'wb') as f: f.write(audio_response.content) return True return False # 使用示例 dingtalk_processor = DingTalkMeetingProcessor( app_key="你的AppKey", app_secret="你的AppSecret", agent_id="你的AgentId" )5. 完整自动化流水线
5.1 主控调度脚本
创建主控脚本meeting_auto_transcribe.py:
import schedule import time import requests import json from datetime import datetime, timedelta from wechat_processor import WeChatMeetingProcessor from dingtalk_processor import DingTalkMeetingProcessor import os class MeetingAutoTranscriber: def __init__(self): # 初始化处理器 self.wechat_processor = WeChatMeetingProcessor( corp_id="企业微信企业ID", corp_secret="企业微信应用密钥", agent_id="企业微信应用ID" ) self.dingtalk_processor = DingTalkMeetingProcessor( app_key="钉钉AppKey", app_secret="钉钉AppSecret", agent_id="钉钉AgentId" ) # 创建存储目录 os.makedirs("audio_files", exist_ok=True) os.makedirs("transcriptions", exist_ok=True) def transcribe_audio(self, audio_path): """调用ASR服务进行转录""" url = "http://localhost:8000/transcribe" with open(audio_path, 'rb') as f: files = {'file': f} response = requests.post(url, files=files) if response.status_code == 200: return response.json().get('text', '') return None def process_wechat_meetings(self): """处理企业微信会议""" print("开始处理企业微信会议...") # 获取最近2小时的会议 end_time = datetime.now() start_time = end_time - timedelta(hours=2) records = self.wechat_processor.get_meeting_records( start_time.strftime("%Y-%m-%d %H:%M:%S"), end_time.strftime("%Y-%m-%d %H:%M:%S") ) for record in records: record_id = record['record_id'] meeting_topic = record['meeting_topic'] # 下载录音 audio_path = f"audio_files/wechat_{record_id}.mp3" if self.wechat_processor.download_record(record_id, audio_path): # 转录 transcription = self.transcribe_audio(audio_path) if transcription: # 保存结果 result_path = f"transcriptions/wechat_{record_id}.txt" with open(result_path, 'w', encoding='utf-8') as f: f.write(f"会议主题: {meeting_topic}\n") f.write(f"会议时间: {record['meeting_time']}\n") f.write(f"转录结果:\n{transcription}\n") print(f"已完成转录: {meeting_topic}") def process_dingtalk_meetings(self): """处理钉钉会议""" print("开始处理钉钉会议...") end_time = int(time.time() * 1000) start_time = end_time - 2 * 60 * 60 * 1000 # 2小时前 meetings = self.dingtalk_processor.get_meeting_list(start_time, end_time) for meeting in meetings: meeting_id = meeting['meeting_id'] meeting_title = meeting['title'] audio_path = f"audio_files/dingtalk_{meeting_id}.mp3" if self.dingtalk_processor.download_meeting_record(meeting_id, audio_path): transcription = self.transcribe_audio(audio_path) if transcription: result_path = f"transcriptions/dingtalk_{meeting_id}.txt" with open(result_path, 'w', encoding='utf-8') as f: f.write(f"会议主题: {meeting_title}\n") f.write(f"转录结果:\n{transcription}\n") print(f"已完成转录: {meeting_title}") def run(self): """运行一次完整的处理流程""" self.process_wechat_meetings() self.process_dingtalk_meetings() def start_scheduler(self): """启动定时任务""" # 每30分钟运行一次 schedule.every(30).minutes.do(self.run) print("会议自动转录服务已启动,每30分钟运行一次...") while True: schedule.run_pending() time.sleep(1) if __name__ == "__main__": transcriber = MeetingAutoTranscriber() transcriber.start_scheduler()5.2 系统服务配置
创建系统服务文件/etc/systemd/system/meeting-transcriber.service:
[Unit] Description=Meeting Auto Transcription Service After=network.target [Service] Type=simple User=root WorkingDirectory=/opt/qwen-asr ExecStart=/usr/bin/python3 /opt/qwen-asr/meeting_auto_transcribe.py Restart=always RestartSec=10 [Install] WantedBy=multi-user.target启用并启动服务:
sudo systemctl daemon-reload sudo systemctl enable meeting-transcriber sudo systemctl start meeting-transcriber6. 常见问题与解决方案
6.1 音频格式处理问题
如果遇到音频格式不支持的情况,可以使用FFmpeg进行转换:
import subprocess def convert_audio_format(input_path, output_path, target_format="wav"): """转换音频格式""" cmd = [ "ffmpeg", "-i", input_path, "-acodec", "pcm_s16le", "-ac", "1", "-ar", "16000", output_path ] subprocess.run(cmd, check=True) # 使用示例 convert_audio_format("input.m4a", "output.wav")6.2 网络连接超时处理
为网络请求添加重试机制:
import requests from requests.adapters import HTTPAdapter from urllib3.util.retry import Retry def create_session_with_retry(): """创建带重试机制的会话""" session = requests.Session() retry_strategy = Retry( total=3, backoff_factor=0.5, status_forcelist=[429, 500, 502, 503, 504] ) adapter = HTTPAdapter(max_retries=retry_strategy) session.mount("http://", adapter) session.mount("https://", adapter) return session6.3 内存优化建议
对于长时间运行的服务,添加内存清理机制:
import gc import torch def cleanup_memory(): """清理GPU和内存""" if torch.cuda.is_available(): torch.cuda.empty_cache() gc.collect() # 在批量处理完成后调用 cleanup_memory()7. 总结
通过本教程,你已经成功搭建了一个完整的会议语音自动归档系统。这个系统能够:
- 自动监控企业微信和钉钉的会议录音
- 智能转录使用Qwen3-ASR-1.7B进行高精度语音识别
- 规范归档将转录结果按时间、平台分类存储
- 持续运行通过系统服务实现7×24小时自动化处理
实际应用效果:
- 转录准确率可达90%以上,特别是中文会议内容
- 每小时可处理数十个会议录音
- 大幅减少人工转录的时间成本
- 确保重要会议内容不会遗漏
下一步优化建议:
- 添加邮件或消息通知功能,转录完成后自动通知相关人员
- 集成到企业知识库系统,实现智能检索
- 添加说话人分离功能,区分不同发言人的内容
- 优化存储策略,定期清理旧的音频文件
这个方案不仅适用于企业会议,也可以应用于在线教育、客户服务等需要语音转文字的多种场景。
获取更多AI镜像
想探索更多AI镜像和应用场景?访问 CSDN星图镜像广场,提供丰富的预置镜像,覆盖大模型推理、图像生成、视频生成、模型微调等多个领域,支持一键部署。
