当前位置：首页 > news >正文

FRCRN语音降噪工具部署教程：Ubuntu+CUDA环境下GPU算力高效利用

news 2026/5/12 2:14:15

FRCRN语音降噪工具部署教程：Ubuntu+CUDA环境下GPU算力高效利用

你是不是也遇到过这样的烦恼？在咖啡馆、地铁上或者家里录制的语音，背景噪音总是挥之不去，人声听起来模糊不清。后期处理时，用传统方法降噪要么效果不明显，要么把人声也处理得怪怪的。

今天要介绍的FRCRN语音降噪工具，就能很好地解决这个问题。它基于阿里巴巴达摩院开源的先进模型，专门处理单通道音频的降噪，特别擅长对付各种复杂的背景噪声，同时还能很好地保留清晰的人声。

更重要的是，这个工具支持GPU加速。如果你有一台带NVIDIA显卡的Ubuntu服务器，就能充分发挥硬件性能，让降噪处理快上加快。这篇教程就是手把手教你如何在Ubuntu+CUDA环境下部署和使用这个工具，让你轻松获得专业级的语音降噪效果。

1. 环境准备与快速部署

在开始之前，我们先看看需要准备什么。整个过程其实不复杂，跟着步骤走，半小时内就能搞定。

1.1 硬件与系统要求

首先确认你的环境是否符合要求：

操作系统：Ubuntu 18.04或更高版本（推荐20.04 LTS）
显卡：NVIDIA GPU（显存建议4GB以上）
内存：至少8GB RAM
存储空间：至少10GB可用空间

如果你用的是云服务器，确保选择了带GPU的实例。本地机器的话，确认显卡驱动已经安装好。

1.2 安装NVIDIA驱动和CUDA

这是利用GPU算力的关键一步。如果你已经装好了，可以跳过这部分。

检查当前驱动状态：

nvidia-smi

如果看到显卡信息，说明驱动已经安装。如果提示命令未找到，需要先安装驱动。

安装NVIDIA驱动（以Ubuntu 20.04为例）：

# 添加显卡驱动PPA sudo add-apt-repository ppa:graphics-drivers/ppa sudo apt update # 查看推荐的驱动版本 ubuntu-drivers devices # 安装推荐版本的驱动（这里以nvidia-driver-470为例） sudo apt install nvidia-driver-470 # 重启系统 sudo reboot

安装CUDA Toolkit：

# 访问NVIDIA官网下载对应版本的CUDA # 这里以CUDA 11.3为例，选择runfile安装方式 wget https://developer.download.nvidia.com/compute/cuda/11.3.0/local_installers/cuda_11.3.0_465.19.01_linux.run sudo sh cuda_11.3.0_465.19.01_linux.run

安装过程中，记得选择安装CUDA Toolkit，其他选项可以根据需要选择。

配置环境变量：

# 编辑bashrc文件 nano ~/.bashrc # 在文件末尾添加 export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH # 保存后使配置生效 source ~/.bashrc

验证CUDA安装：

nvcc --version

1.3 安装Python和必要依赖

接下来安装Python和相关库：

# 更新系统包 sudo apt update sudo apt upgrade -y # 安装Python3和pip sudo apt install python3 python3-pip -y # 安装FFmpeg（用于音频格式处理） sudo apt install ffmpeg -y # 安装Python虚拟环境工具 sudo apt install python3-venv -y # 创建虚拟环境 python3 -m venv frcrn_env source frcrn_env/bin/activate

1.4 安装PyTorch和ModelScope

现在安装核心的Python包。注意要安装支持CUDA的PyTorch版本：

# 安装支持CUDA的PyTorch # 根据你的CUDA版本选择对应的命令 # CUDA 11.3对应以下命令 pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113 # 安装ModelScope pip install modelscope # 安装其他依赖 pip install librosa soundfile numpy

验证PyTorch是否能识别GPU：

import torch print(f"PyTorch版本: {torch.__version__}") print(f"CUDA是否可用: {torch.cuda.is_available()}") print(f"GPU数量: {torch.cuda.device_count()}") print(f"当前GPU: {torch.cuda.get_device_name(0)}")

如果看到CUDA可用，并且显示了你的GPU型号，说明环境配置成功。

2. 获取和准备FRCRN项目

环境准备好了，现在来获取FRCRN项目代码。

2.1 下载项目代码

# 克隆项目（如果已有Git） git clone https://github.com/modelscope/modelscope.git cd modelscope # 或者直接下载ZIP包 wget https://github.com/modelscope/modelscope/archive/refs/heads/master.zip unzip master.zip cd modelscope-master

2.2 准备测试音频

FRCRN模型对输入音频有特定要求，我们需要准备合适的测试文件：

音频要求：

采样率：必须是16000 Hz（16k）
声道：单声道（Mono）
格式：建议使用.wav格式

如果你手头没有合适的测试音频，可以用以下方法创建：

方法一：录制测试音频

# 安装录音工具 sudo apt install sox # 录制10秒测试音频（按Ctrl+C停止） rec test_noisy.wav rate 16k channels 1

方法二：转换现有音频如果你的音频不符合要求，用FFmpeg转换：

# 转换为16k单声道wav ffmpeg -i your_audio.mp3 -ar 16000 -ac 1 input_noisy.wav # 查看转换后的音频信息 ffprobe input_noisy.wav

方法三：下载示例音频

# 下载一个带噪音的示例音频 wget https://example.com/noisy_audio.wav -O input_noisy.wav

2.3 项目结构说明

了解项目结构有助于后续使用：

FRCRN/ ├── test.py # 主测试脚本 ├── requirements.txt # 依赖包列表 ├── README.md # 说明文档 └── examples/ # 示例文件 └── noisy_audio.wav # 示例音频

3. 运行第一个降噪示例

一切准备就绪，现在来运行第一个降噪测试。

3.1 基础使用方式

进入项目目录，运行测试脚本：

# 进入FRCRN目录 cd demos/speech_frcrn_ans_cirm_16k # 运行测试脚本 python test.py

第一次运行会比较慢，因为需要下载模型文件（大约几百MB）。下载完成后，会在当前目录生成降噪后的音频。

3.2 理解代码逻辑

看看test.py里面做了什么：

#!/usr/bin/env python3 from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks # 创建降噪管道 ans_pipeline = pipeline( Tasks.acoustic_noise_suppression, model='damo/speech_frcrn_ans_cirm_16k' ) # 指定输入输出文件 input_path = 'examples/noisy_audio.wav' output_path = 'examples/denoised_audio.wav' # 执行降噪 result = ans_pipeline(input_path, output_path=output_path) print(f'降噪完成！输出文件: {output_path}')

这段代码做了三件事：

创建了一个降噪处理管道
指定了输入和输出文件路径
执行降噪并保存结果

3.3 自定义输入输出

如果你想处理自己的音频文件，修改代码很简单：

from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks # 初始化管道 ans_pipeline = pipeline( Tasks.acoustic_noise_suppression, model='damo/speech_frcrn_ans_cirm_16k' ) # 使用你自己的文件 your_input = 'path/to/your/noisy_audio.wav' your_output = 'path/to/your/clean_audio.wav' # 执行降噪 result = ans_pipeline(your_input, output_path=your_output) print(f'你的音频已处理完成，保存到: {your_output}')

4. 高级使用技巧

掌握了基础用法后，来看看一些提升使用体验的技巧。

4.1 批量处理多个文件

如果你有很多音频需要处理，可以写个简单的批量处理脚本：

import os from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks def batch_denoise(input_dir, output_dir): """批量降噪处理""" # 创建输出目录 os.makedirs(output_dir, exist_ok=True) # 初始化管道 ans_pipeline = pipeline( Tasks.acoustic_noise_suppression, model='damo/speech_frcrn_ans_cirm_16k' ) # 遍历输入目录的所有wav文件 for filename in os.listdir(input_dir): if filename.endswith('.wav'): input_path = os.path.join(input_dir, filename) output_path = os.path.join(output_dir, f'clean_{filename}') print(f'正在处理: {filename}') try: ans_pipeline(input_path, output_path=output_path) print(f'✓ 完成: {filename}') except Exception as e: print(f'✗ 处理失败 {filename}: {e}') print('所有文件处理完成！') # 使用示例 batch_denoise('noisy_audios/', 'clean_audios/')

4.2 监控GPU使用情况

处理大量音频时，可以监控GPU的使用情况：

# 在另一个终端窗口运行，实时查看GPU状态 watch -n 1 nvidia-smi

你会看到类似这样的输出：

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A | | 30% 45C P2 72W / 250W | 3421MiB / 8192MiB | 45% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

关注这几个指标：

GPU-Util：GPU使用率，越高说明利用越充分
Memory-Usage：显存使用量
Temp：GPU温度，正常在40-80度之间

4.3 性能优化建议

为了让FRCRN跑得更快更好，这里有几个小建议：

调整批处理大小：

# 如果显存足够大，可以尝试增加批处理大小 ans_pipeline = pipeline( Tasks.acoustic_noise_suppression, model='damo/speech_frcrn_ans_cirm_16k', batch_size=4 # 默认是1，根据显存调整 )

使用半精度浮点数（减少显存使用，加快计算）：

import torch # 在初始化管道前设置 torch.set_float32_matmul_precision('medium') # 或 'high' ans_pipeline = pipeline( Tasks.acoustic_noise_suppression, model='damo/speech_frcrn_ans_cirm_16k' )

预处理音频文件：

import librosa import soundfile as sf def preprocess_audio(input_path, output_path): """预处理音频：确保16k采样率，单声道""" audio, sr = librosa.load(input_path, sr=16000, mono=True) sf.write(output_path, audio, 16000) return output_path # 使用预处理 clean_input = preprocess_audio('your_audio.mp3', 'preprocessed.wav') result = ans_pipeline(clean_input, output_path='denoised.wav')

5. 常见问题与解决方案

在实际使用中，你可能会遇到一些问题。这里整理了一些常见问题和解决方法。

5.1 音频相关问题

问题：降噪后声音变调或有杂音这通常是因为输入音频的采样率不是16k。FRCRN不会自动重采样，需要你先处理好。

解决方案：

# 使用FFmpeg转换 ffmpeg -i original.wav -ar 16000 -ac 1 -c:a pcm_s16le input_16k.wav # 或者用Python处理 import librosa import soundfile as sf audio, sr = librosa.load('original.wav', sr=16000, mono=True) sf.write('input_16k.wav', audio, 16000)

问题：处理后的音频音量太小模型降噪可能会降低整体音量，可以后期调整：

# 使用FFmpeg增加音量（提高10dB） ffmpeg -i denoised.wav -af "volume=10dB" louder.wav # 或者标准化音量 ffmpeg -i denoised.wav -af "loudnorm=I=-16:LRA=11:TP=-1.5" normalized.wav

5.2 性能相关问题

问题：第一次运行特别慢这是正常的，因为需要下载模型文件。模型文件大约几百MB，下载完成后会缓存在本地，后续运行就快了。

查看模型缓存位置：

from modelscope.hub.snapshot_download import snapshot_download model_dir = snapshot_download('damo/speech_frcrn_ans_cirm_16k') print(f'模型缓存位置: {model_dir}')

问题：GPU内存不足如果处理很长的音频或批量处理时遇到内存不足：

# 方法1：使用CPU模式（慢但稳定） ans_pipeline = pipeline( Tasks.acoustic_noise_suppression, model='damo/speech_frcrn_ans_cirm_16k', device='cpu' # 强制使用CPU ) # 方法2：分段处理长音频 def process_long_audio(input_path, output_path, chunk_duration=30): """分段处理长音频，每段30秒""" import librosa import soundfile as sf import numpy as np audio, sr = librosa.load(input_path, sr=16000, mono=True) chunk_samples = chunk_duration * sr processed_chunks = [] for i in range(0, len(audio), chunk_samples): chunk = audio[i:i + chunk_samples] # 保存临时文件 temp_input = f'temp_input_{i}.wav' temp_output = f'temp_output_{i}.wav' sf.write(temp_input, chunk, sr) # 处理当前片段 ans_pipeline(temp_input, output_path=temp_output) # 读取处理结果 processed_chunk, _ = librosa.load(temp_output, sr=sr, mono=True) processed_chunks.append(processed_chunk) # 合并所有片段 full_audio = np.concatenate(processed_chunks) sf.write(output_path, full_audio, sr)

5.3 模型相关问题

问题：想尝试其他降噪模型ModelScope上还有其他语音处理模型：

# 其他可用的降噪模型 models = [ 'damo/speech_dfsmn_ans_psm_48k_causal', # 实时降噪 'damo/speech_frcrn_ans_cirm_16k', # 我们用的这个 'damo/speech_mossformer_ans_cirm_16k', # 另一种架构 ] # 可以写个函数测试不同模型 def compare_models(input_path, models_list): results = {} for model_name in models_list: print(f'测试模型: {model_name}') pipeline_obj = pipeline( Tasks.acoustic_noise_suppression, model=model_name ) output_path = f'output_{model_name.replace("/", "_")}.wav' pipeline_obj(input_path, output_path=output_path) results[model_name] = output_path return results

问题：如何评估降噪效果虽然主观听感很重要，但也可以用客观指标：

import numpy as np import librosa from scipy import signal def calculate_snr(clean_path, noisy_path): """计算信噪比""" clean, _ = librosa.load(clean_path, sr=16000) noisy, _ = librosa.load(noisy_path, sr=16000) # 确保长度一致 min_len = min(len(clean), len(noisy)) clean = clean[:min_len] noisy = noisy[:min_len] # 计算噪声 noise = noisy - clean # 计算功率 signal_power = np.sum(clean**2) noise_power = np.sum(noise**2) # 避免除零 if noise_power == 0: return float('inf') snr = 10 * np.log10(signal_power / noise_power) return snr # 使用示例 original_snr = calculate_snr('clean_reference.wav', 'noisy.wav') denoised_snr = calculate_snr('clean_reference.wav', 'denoised.wav') print(f'原始SNR: {original_snr:.2f} dB') print(f'降噪后SNR: {denoised_snr:.2f} dB') print(f'提升: {denoised_snr - original_snr:.2f} dB')

6. 实际应用场景

了解了基本用法后，我们来看看FRCRN在实际工作中能帮我们做什么。

6.1 语音通话质量提升

如果你在做语音通话或会议应用，FRCRN可以显著提升语音质量：

class RealTimeDenoiser: """实时语音降噪处理类""" def __init__(self): self.pipeline = pipeline( Tasks.acoustic_noise_suppression, model='damo/speech_frcrn_ans_cirm_16k' ) self.buffer = [] def process_chunk(self, audio_chunk): """处理一个音频片段""" # 保存临时文件 import tempfile import soundfile as sf with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as tmp: sf.write(tmp.name, audio_chunk, 16000) output_file = tmp.name.replace('.wav', '_clean.wav') # 降噪处理 self.pipeline(tmp.name, output_path=output_file) # 读取处理结果 cleaned_chunk, _ = librosa.load(output_file, sr=16000) # 清理临时文件 os.unlink(tmp.name) os.unlink(output_file) return cleaned_chunk def real_time_processing(self, input_stream, output_stream): """实时处理音频流""" chunk_size = 16000 * 3 # 3秒的块 while True: # 从输入流读取数据 chunk = input_stream.read(chunk_size) if not chunk: break # 处理当前块 cleaned = self.process_chunk(chunk) # 写入输出流 output_stream.write(cleaned.tobytes())

6.2 播客和视频配音处理

做播客或视频配音时，背景噪音很让人头疼。用FRCRN批量处理：

def process_podcast_episode(input_file, output_file): """处理播客单集""" print(f"开始处理: {input_file}") # 1. 预处理：转换为16k单声道 temp_16k = 'temp_16k.wav' os.system(f'ffmpeg -i "{input_file}" -ar 16000 -ac 1 "{temp_16k}" -y') # 2. 降噪处理 pipeline_obj = pipeline( Tasks.acoustic_noise_suppression, model='damo/speech_frcrn_ans_cirm_16k' ) temp_clean = 'temp_clean.wav' pipeline_obj(temp_16k, output_path=temp_clean) # 3. 后处理：恢复原始采样率（如果需要） original_info = os.popen(f'ffprobe -v error -show_entries stream=sample_rate -of default=noprint_wrappers=1:nokey=1 "{input_file}"').read() original_sr = int(original_info.strip()) if original_sr != 16000: os.system(f'ffmpeg -i "{temp_clean}" -ar {original_sr} "{output_file}" -y') else: os.rename(temp_clean, output_file) # 4. 清理临时文件 for temp_file in [temp_16k, temp_clean]: if os.path.exists(temp_file): os.remove(temp_file) print(f"处理完成: {output_file}") return output_file

6.3 语音识别前置处理

如果你在用语音识别（ASR），先降噪能显著提升识别准确率：

def asr_with_denoise(audio_path, asr_model): """带降噪的语音识别""" # 第一步：降噪 print("正在进行降噪处理...") denoiser = pipeline( Tasks.acoustic_noise_suppression, model='damo/speech_frcrn_ans_cirm_16k' ) clean_audio_path = 'clean_for_asr.wav' denoiser(audio_path, output_path=clean_audio_path) # 第二步：语音识别 print("正在进行语音识别...") transcription = asr_model.transcribe(clean_audio_path) # 清理临时文件 if os.path.exists(clean_audio_path): os.remove(clean_audio_path) return transcription # 使用示例 # 假设你有一个ASR模型 # result = asr_with_denoise('noisy_meeting.wav', your_asr_model) # print(f"识别结果: {result}")

6.4 音频素材库清理

如果你有大量历史录音需要清理：

import os from tqdm import tqdm # 进度条库 from multiprocessing import Pool def process_single_file(args): """处理单个文件（用于多进程）""" input_file, output_dir = args try: # 生成输出路径 filename = os.path.basename(input_file) output_file = os.path.join(output_dir, f'clean_{filename}') # 跳过已处理的文件 if os.path.exists(output_file): return f"跳过: {filename}" # 处理 pipeline_obj = pipeline( Tasks.acoustic_noise_suppression, model='damo/speech_frcrn_ans_cirm_16k' ) pipeline_obj(input_file, output_path=output_file) return f"完成: {filename}" except Exception as e: return f"失败: {filename} - {str(e)}" def batch_process_audio_library(input_dir, output_dir, num_workers=4): """批量处理音频库""" # 收集所有wav文件 audio_files = [] for root, dirs, files in os.walk(input_dir): for file in files: if file.lower().endswith('.wav'): audio_files.append(os.path.join(root, file)) print(f"找到 {len(audio_files)} 个音频文件") # 创建输出目录 os.makedirs(output_dir, exist_ok=True) # 准备参数 tasks = [(f, output_dir) for f in audio_files] # 多进程处理 with Pool(num_workers) as pool: results = list(tqdm( pool.imap(process_single_file, tasks), total=len(tasks), desc="处理进度" )) # 打印结果统计 success = sum(1 for r in results if r.startswith('完成')) skipped = sum(1 for r in results if r.startswith('跳过')) failed = sum(1 for r in results if r.startswith('失败')) print(f"\n处理完成！成功: {success}, 跳过: {skipped}, 失败: {failed}") # 输出失败详情 if failed > 0: print("\n失败的文件:") for r in results: if r.startswith('失败'): print(f" {r}")