当前位置：首页 > news >正文

YOLO12保姆级教程：Gradio队列限流+并发控制防GPU OOM崩溃

news 2026/7/28 11:41:12

YOLO12保姆级教程：Gradio队列限流+并发控制防GPU OOM崩溃

1. 引言：为什么需要并发控制？

如果你用过YOLO12做目标检测，可能遇到过这种情况：同时有几个人上传图片，突然界面卡住不动，然后整个服务崩溃了。这就是典型的GPU内存溢出（OOM）问题。

YOLO12作为2025年最新的目标检测模型，虽然推理速度很快，但每个推理任务都需要占用大量GPU内存。当多个请求同时到来时，显存很快就被占满，导致服务崩溃。

本教程将手把手教你如何用Gradio的队列系统和并发控制机制，从根本上解决这个问题。学完后你的YOLO12服务将能够：

稳定处理高并发请求
自动排队避免GPU过载
支持多用户同时使用不崩溃

2. 理解Gradio的队列机制

2.1 队列是什么？

简单来说，队列就像超市的收银台。顾客（用户请求）排队等待，收银员（GPU）一个一个处理。如果没有队列，所有人都挤到收银台，结果就是谁也结不了账。

Gradio内置的队列系统就是这个"收银台"，它能：

按顺序处理请求
控制同时处理的任务数量
显示排队状态和预计等待时间

2.2 基本队列配置

先来看最简单的队列配置：

import gradio as gr # 创建界面时启用队列 demo = gr.Interface( fn=detect_function, inputs=gr.Image(), outputs=gr.Image(), ).queue() # 这一行启用队列 demo.launch()

这样就已经有了基本的队列功能，但还不够智能。接下来我们逐步添加更多控制。

3. 完整防OOM配置方案

3.1 基础环境准备

首先确保你的环境中有这些库：

pip install gradio>=4.0 ultralytics torch torchvision

3.2 YOLO12检测函数

这是核心的检测函数，我们先实现基础版本：

from ultralytics import YOLO import cv2 import numpy as np # 加载模型（全局只加载一次） model = YOLO('yolo12m.pt') def yolo12_detect(image, conf_threshold=0.25, iou_threshold=0.45): """ YOLO12目标检测函数 """ # 转换图像格式 if isinstance(image, np.ndarray): image = image else: image = np.array(image) # 执行推理 results = model.predict( source=image, conf=conf_threshold, iou=iou_threshold, verbose=False # 减少日志输出 ) # 绘制检测结果 annotated_image = results[0].plot() return annotated_image

3.3 高级队列配置

现在我们来配置完整的防OOM方案：

import gradio as gr import time # 创建界面 with gr.Blocks(title="YOLO12目标检测-队列优化版") as demo: gr.Markdown("# 🚀 YOLO12目标检测（防OOM版）") gr.Markdown("上传图片进行目标检测，系统自动排队防止GPU过载") with gr.Row(): with gr.Column(): input_image = gr.Image(label="上传图片", type="numpy") conf_slider = gr.Slider(0.1, 0.9, value=0.25, label="置信度阈值") iou_slider = gr.Slider(0.1, 0.9, value=0.45, label="IOU阈值") submit_btn = gr.Button("开始检测", variant="primary") with gr.Column(): output_image = gr.Image(label="检测结果") status_text = gr.Textbox(label="处理状态", interactive=False) # 配置队列参数 demo.queue( concurrency_count=1, # 同时处理的任务数 max_size=10, # 最大排队数量 api_open=False # 不开放API接口 ) # 按钮点击事件 submit_btn.click( fn=process_detection, inputs=[input_image, conf_slider, iou_slider], outputs=[output_image, status_text] ) def process_detection(image, conf_threshold, iou_threshold): """ 包装检测函数，添加状态信息 """ if image is None: return None, "请先上传图片" try: # 更新状态 status = "正在处理中..." # 执行检测 result_image = yolo12_detect(image, conf_threshold, iou_threshold) return result_image, "检测完成！" except Exception as e: return None, f"处理失败: {str(e)}"

4. 并发控制深度优化

4.1 理解并发数设置

concurrency_count=1是最关键的参数，它控制同时处理的任务数量。为什么设置为1？

YOLO12特性：每个推理任务都需要大量显存
安全第一：单个任务可能就用掉大部分显存
稳定优先：宁可慢一点，也不能让服务崩溃

如果你的GPU显存很大（比如48GB），可以适当增加：

demo.queue( concurrency_count=2, # 24GB显存可设置为2 max_size=15, )

4.2 内存监控与动态调整

更高级的做法是动态调整并发数：

import pynvml def get_gpu_memory(): """获取GPU内存使用情况""" pynvml.nvmlInit() handle = pynvml.nvmlDeviceGetHandleByIndex(0) info = pynvml.nvmlDeviceGetMemoryInfo(handle) return info.used / info.total def adaptive_concurrency(): """根据内存使用动态调整并发数""" memory_usage = get_gpu_memory() if memory_usage > 0.8: # 使用率超过80% return 1 elif memory_usage > 0.6: return 2 else: return 3

4.3 队列状态显示

让用户知道当前排队情况：

def process_detection(image, conf_threshold, iou_threshold): # 获取队列信息 queue_size = demo.get_queue_size() if queue_size > 0: status = f"排队中... 前面还有 {queue_size} 个任务" else: status = "正在处理中..." # 执行检测 result_image = yolo12_detect(image, conf_threshold, iou_threshold) return result_image, "检测完成！"

5. 实战：完整可用的代码

这里提供完整的可直接运行的代码：

import gradio as gr import numpy as np from ultralytics import YOLO import cv2 # 全局模型加载 model = YOLO('yolo12m.pt') class YOLO12Detector: def __init__(self): self.model = model def detect(self, image, conf_threshold=0.25, iou_threshold=0.45): """执行YOLO12检测""" if image is None: return None # 转换图像格式 if not isinstance(image, np.ndarray): image = np.array(image) # RGB转换 if len(image.shape) == 3 and image.shape[2] == 3: image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR) # 推理 results = self.model.predict( source=image, conf=conf_threshold, iou=iou_threshold, verbose=False ) # 绘制结果 annotated_image = results[0].plot() annotated_image = cv2.cvtColor(annotated_image, cv2.COLOR_BGR2RGB) return annotated_image # 创建检测器实例 detector = YOLO12Detector() def process_detection(image, conf_threshold, iou_threshold): """处理检测请求""" try: if image is None: return None, "请上传图片" # 执行检测 result = detector.detect(image, conf_threshold, iou_threshold) if result is None: return None, "检测失败" return result, "检测完成" except Exception as e: return None, f"错误: {str(e)}" # 创建Gradio界面 with gr.Blocks(theme=gr.themes.Soft()) as demo: gr.Markdown(""" # 🎯 YOLO12目标检测系统 ### 带队列保护的稳定版本 """) with gr.Row(): with gr.Column(scale=1): gr.Markdown("### 📤 输入") input_image = gr.Image(label="上传图片", type="numpy") conf_slider = gr.Slider(0.1, 0.9, value=0.25, label="置信度阈值") iou_slider = gr.Slider(0.1, 0.9, value=0.45, label="IOU阈值") submit_btn = gr.Button("开始检测", variant="primary") with gr.Column(scale=1): gr.Markdown("### 📥 输出") output_image = gr.Image(label="检测结果") status = gr.Textbox(label="状态", interactive=False) # 配置队列 demo.queue( concurrency_count=1, max_size=10, default_concurrency_limit=1 ) # 绑定事件 submit_btn.click( fn=process_detection, inputs=[input_image, conf_slider, iou_slider], outputs=[output_image, status] ) # 启动服务 if __name__ == "__main__": demo.launch( server_name="0.0.0.0", server_port=7860, share=False )

6. 部署与优化建议

6.1 生产环境部署

对于生产环境，建议使用更稳定的启动方式：

# 使用nohup后台运行 nohup python app.py > server.log 2>&1 & # 或者使用gunicorn（如果支持WSGI） gunicorn -w 1 -b 0.0.0.0:7860 app:demo

6.2 性能监控

添加简单的性能监控：

import time def process_detection(image, conf_threshold, iou_threshold): start_time = time.time() try: # ...检测逻辑... end_time = time.time() process_time = end_time - start_time return result, f"检测完成！耗时 {process_time:.2f}秒" except Exception as e: return None, f"错误: {str(e)}"

6.3 错误处理与重试

增强错误处理机制：

def safe_detect(image, conf_threshold, iou_threshold, max_retries=3): """带重试机制的检测""" for attempt in range(max_retries): try: return detector.detect(image, conf_threshold, iou_threshold) except RuntimeError as e: if "CUDA out of memory" in str(e): # 清空GPU缓存 torch.cuda.empty_cache() time.sleep(1) # 等待1秒再重试 continue else: raise e raise RuntimeError("检测失败：GPU内存不足")