当前位置: 首页 > news >正文

改进Neck层特征金字塔的YOLO算法在航拍图像检测中的应用:完整实现与性能优化指南

摘要

航拍图像检测在无人机巡检、智慧城市、农业监测等领域具有重要应用价值。然而,航拍图像存在目标尺度变化大、背景复杂、小目标密集等特点,对目标检测算法提出更高要求。本文提出一种改进Neck层特征金字塔的YOLOv8算法,通过引入加权双向特征金字塔(BiFPN)、自适应特征融合模块(AFF)和空洞空间金字塔池化(ASPP),显著提升航拍图像中多尺度目标的检测能力。实验基于VisDrone和DIOR两个公开航拍数据集,改进算法在mAP50指标上分别提升5.2%和4.8%,尤其对小目标检测性能提升显著。本文将详细介绍算法原理、PyTorch实现代码、训练技巧及实验结果分析,为航拍图像检测任务提供完整的解决方案。

关键词:YOLOv8;特征金字塔;航拍图像;小目标检测;BiFPN

1. 引言

1.1 航拍图像检测的挑战

随着无人机技术的普及,航拍图像分析已成为计算机视觉领域的研究热点。与自然场景图像相比,航拍图像具有以下显著特点:

  1. 目标尺度极端变化:航拍图像中既有占据大面积的建筑物、停车场,也有仅几十个像素的行人、车辆,多尺度特性尤为突出

  2. 小目标密集分布:如密集停放的车辆、人群等,目标间相互遮挡,难以区分

  3. 复杂背景干扰:光照变化、阴影、建筑物纹理等造成大量虚警

  4. 视角特殊性:俯视视角导致目标外观与常规数据集存在较大差异

1.2 YOLO系列算法在航拍检测中的局限性

YOLO系列算法凭借其端到端的检测架构和实时推理速度,成为工业界应用最广泛的目标检测框架。然而,标准YOLO算法在航拍图像检测中存在以下问题:

  1. 特征金字塔结构简单:传统的FPN或PANet采用简单的自上而下和自下而上路径融合,对不同尺度特征的贡献平等对待,缺乏自适应能力

  2. 小目标特征易丢失:随着网络加深,小目标的语义信息在多次下采样后几乎消失

  3. 特征融合方式单一:简单的加法或拼接操作无法充分利用多尺度特征的互补信息

1.3 本文贡献

针对上述问题,本文提出一种改进Neck层特征金字塔的YOLOv8算法,主要贡献包括:

  1. 引入加权双向特征金字塔(BiFPN)结构,为不同尺度的特征层分配可学习权重,实现更有效的多尺度特征融合

  2. 设计自适应特征融合模块(AFF),通过注意力机制动态调整融合权重,增强特征表示能力

  3. 嵌入空洞空间金字塔池化(ASPP)模块,扩大感受野,捕获多尺度上下文信息

  4. 在VisDrone和DIOR数据集上进行充分实验,验证改进算法的有效性

  5. 提供完整的PyTorch实现代码和训练配置,便于研究者和工程师复现和应用

2. 相关工作

2.1 航拍图像目标检测研究现状

近年来,针对航拍图像的目标检测研究主要集中在以下几个方面:

数据增强策略:针对小目标问题,研究者提出Mosaic、MixUp等数据增强方法,增加小目标的样本多样性。随机裁剪和缩放策略也有助于模拟不同飞行高度下的目标尺度变化。

多尺度检测架构:FPN、PANet等特征金字塔结构成为航拍检测的标配,后续改进如NAS-FPN、BiFPN进一步优化了特征融合方式。

注意力机制应用:SE-Net、CBAM、Coordinate Attention等注意力模块被广泛应用于航拍检测,以增强模型对关键区域的关注。

2.2 YOLO系列算法发展

YOLO算法经历了从YOLOv1到YOLOv8的演进:

  • YOLOv1-v3:奠定单阶段检测基础,引入Anchor机制和多尺度预测

  • YOLOv4:引入CSPDarknet骨干网络和PANet Neck

  • YOLOv5:优化数据增强和训练策略,提供易用的工程化实现

  • YOLOv6-v7:进一步优化网络结构和训练技巧

  • YOLOv8:采用Anchor-Free设计,引入C2f模块和Decoupled Head,在精度和速度上取得更好平衡

2.3 特征金字塔网络改进

特征金字塔网络的改进主要有三个方向:

  1. 路径增强:PANet增加自底向上的路径,缩短信息传播路径

  2. 神经架构搜索:NAS-FPN通过搜索得到最优特征融合结构

  3. 加权特征融合:BiFPN为不同输入特征学习权重,实现高效多尺度融合

3. 改进的YOLOv8算法设计

3.1 总体架构

本文改进的YOLOv8算法整体架构如图1所示,主要包括三部分:

  • Backbone:采用CSPDarknet结构,包含C2f模块,提取多尺度特征

  • Neck:改进的特征金字塔,包含BiFPN结构、AFF模块和ASPP模块

  • Head:采用解耦检测头,分别预测类别和边界框

3.2 加权双向特征金字塔(BiFPN)

标准FPN将高层语义特征通过上采样与底层特征相加,忽略了不同尺度特征对最终预测的差异化贡献。BiFPN引入可学习的权重参数,实现加权特征融合。

数学表达
对于第i层特征,融合公式为:

text

P_i = Conv( (w1 * P_i^{in} + w2 * Resize(P_{i+1}^{out})) / (w1 + w2 + ε) )

其中w1、w2为可学习参数,Resize为上采样操作,ε=0.0001避免除零。

PyTorch实现

python

import torch import torch.nn as nn import torch.nn.functional as F class BiFPNBlock(nn.Module): def __init__(self, channels, num_levels=5): super(BiFPNBlock, self).__init__() self.num_levels = num_levels self.channels = channels # 可学习权重 self.w1 = nn.Parameter(torch.ones(2, num_levels)) self.w2 = nn.Parameter(torch.ones(3, num_levels - 2)) # 卷积层 self.conv_up = nn.ModuleList([ nn.Conv2d(channels, channels, 3, 1, 1) for _ in range(num_levels) ]) self.conv_down = nn.ModuleList([ nn.Conv2d(channels, channels, 3, 1, 1) for _ in range(num_levels - 2) ]) # 激活函数 self.relu = nn.ReLU() def forward(self, inputs): # inputs: 多尺度特征列表 [P3, P4, P5, P6, P7] assert len(inputs) == self.num_levels # 标准化权重 w1 = self.relu(self.w1) w1 = w1 / (torch.sum(w1, dim=0, keepdim=True) + 0.0001) w2 = self.relu(self.w2) w2 = w2 / (torch.sum(w2, dim=0, keepdim=True) + 0.0001) # 自顶向下路径 outputs = [] for i in range(self.num_levels - 1, -1, -1): if i == self.num_levels - 1: outputs.append(inputs[i]) else: # 上采样融合 up_feat = F.interpolate(outputs[-1], size=inputs[i].shape[2:], mode='nearest') fused = w1[0, i] * inputs[i] + w1[1, i] * up_feat outputs.append(self.conv_up[i](fused)) outputs = outputs[::-1] # 反转回原顺序 # 自底向上路径 final_outputs = [outputs[0]] for i in range(1, self.num_levels - 1): # 下采样融合 down_feat = F.max_pool2d(final_outputs[-1], 2) fused = w2[0, i-1] * outputs[i] + w2[1, i-1] * down_feat + w2[2, i-1] * inputs[i] final_outputs.append(self.conv_down[i-1](fused)) final_outputs.append(outputs[-1]) return final_outputs

3.3 自适应特征融合模块(AFF)

不同尺度的特征包含不同类型的信息:浅层特征保留细节信息但语义较弱,深层特征语义丰富但空间分辨率低。AFF模块通过注意力机制动态调整融合权重,使网络根据输入内容自适应选择重要特征。

模块结构
AFF包含两个分支:

  1. 全局上下文分支:通过全局平均池化捕获全局信息

  2. 局部细节分支:通过1x1卷积保留局部细节

PyTorch实现

python

class AFFModule(nn.Module): def __init__(self, channels, reduction=8): super(AFFModule, self).__init__() self.channels = channels # 全局分支 self.global_avg_pool = nn.AdaptiveAvgPool2d(1) self.global_fc = nn.Sequential( nn.Linear(channels, channels // reduction), nn.ReLU(), nn.Linear(channels // reduction, channels), nn.Sigmoid() ) # 局部分支 self.local_conv = nn.Sequential( nn.Conv2d(channels, channels // reduction, 1), nn.BatchNorm2d(channels // reduction), nn.ReLU(), nn.Conv2d(channels // reduction, channels, 1), nn.Sigmoid() ) # 融合权重 self.fusion_weight = nn.Parameter(torch.ones(2)) def forward(self, x, y): # x: 浅层特征, y: 深层特征 B, C, H, W = x.shape # 全局注意力 global_att = self.global_avg_pool(y).view(B, C) global_att = self.global_fc(global_att).view(B, C, 1, 1) # 局部注意力 local_att = self.local_conv(x) # 加权融合 weight = torch.softmax(self.fusion_weight, dim=0) fused_att = weight[0] * global_att + weight[1] * local_att # 应用注意力 out = x * fused_att + y * (1 - fused_att) return out

3.4 空洞空间金字塔池化(ASPP)

ASPP模块通过不同膨胀率的空洞卷积并行提取多尺度上下文信息,有效扩大感受野而不增加参数量。对于航拍图像中的大目标(如建筑物),ASPP能够捕获全局上下文,减少背景干扰。

PyTorch实现

python

class ASPPModule(nn.Module): def __init__(self, in_channels, out_channels, rates=[6, 12, 18]): super(ASPPModule, self).__init__() # 1x1卷积分支 self.branch1 = nn.Sequential( nn.Conv2d(in_channels, out_channels, 1), nn.BatchNorm2d(out_channels), nn.ReLU() ) # 空洞卷积分支 self.branch2 = nn.Sequential( nn.Conv2d(in_channels, out_channels, 3, padding=rates[0], dilation=rates[0]), nn.BatchNorm2d(out_channels), nn.ReLU() ) self.branch3 = nn.Sequential( nn.Conv2d(in_channels, out_channels, 3, padding=rates[1], dilation=rates[1]), nn.BatchNorm2d(out_channels), nn.ReLU() ) self.branch4 = nn.Sequential( nn.Conv2d(in_channels, out_channels, 3, padding=rates[2], dilation=rates[2]), nn.BatchNorm2d(out_channels), nn.ReLU() ) # 全局平均池化分支 self.branch5 = nn.Sequential( nn.AdaptiveAvgPool2d(1), nn.Conv2d(in_channels, out_channels, 1), nn.BatchNorm2d(out_channels), nn.ReLU() ) # 融合卷积 self.fusion = nn.Sequential( nn.Conv2d(out_channels * 5, out_channels, 1), nn.BatchNorm2d(out_channels), nn.ReLU(), nn.Conv2d(out_channels, out_channels, 3, padding=1), nn.BatchNorm2d(out_channels), nn.ReLU() ) def forward(self, x): size = x.shape[2:] # 各分支计算 out1 = self.branch1(x) out2 = self.branch2(x) out3 = self.branch3(x) out4 = self.branch4(x) # 全局分支需上采样 out5 = self.branch5(x) out5 = F.interpolate(out5, size=size, mode='bilinear', align_corners=False) # 拼接融合 out = torch.cat([out1, out2, out3, out4, out5], dim=1) out = self.fusion(out) return out

3.5 改进的Neck结构

将上述模块集成到YOLOv8的Neck中,构建增强型特征金字塔:

python

class ImprovedYOLOv8Neck(nn.Module): def __init__(self, channels=[256, 512, 768, 768], num_levels=4): super(ImprovedYOLOv8Neck, self).__init__() # BiFPN层 self.bifpn1 = BiFPNBlock(channels[0], num_levels) self.bifpn2 = BiFPNBlock(channels[0], num_levels) # AFF模块 self.aff_modules = nn.ModuleList([ AFFModule(channels[0]) for _ in range(num_levels - 1) ]) # ASPP模块(应用于最深特征层) self.aspp = ASPPModule(channels[-1], channels[-1]) # 特征转换层 self.conv_down = nn.ModuleList([ nn.Conv2d(channels[i], channels[0], 1) for i in range(1, len(channels)) ]) self.conv_up = nn.ModuleList([ nn.Conv2d(channels[0], channels[i], 1) for i in range(1, len(channels)) ]) def forward(self, features): # features来自Backbone: [P3, P4, P5, P6] # 统一通道数 proj_features = [features[0]] for i, feat in enumerate(features[1:]): proj_features.append(self.conv_down[i](feat)) # 应用BiFPN bifpn_out1 = self.bifpn1(proj_features) # 应用AFF进行特征增强 aff_out = [] for i in range(len(bifpn_out1) - 1): aff_out.append(self.aff_modules[i](bifpn_out1[i], bifpn_out1[i+1])) aff_out.append(bifpn_out1[-1]) # 最深特征应用ASPP aff_out[-1] = self.aspp(aff_out[-1]) # 第二次BiFPN融合 bifpn_out2 = self.bifpn2(aff_out) # 恢复原始通道数 final_features = [bifpn_out2[0]] for i, feat in enumerate(bifpn_out2[1:]): final_features.append(self.conv_up[i](feat)) return final_features

4. 完整代码实现

4.1 环境配置

bash

# 创建conda环境 conda create -n yolo_aerial python=3.9 conda activate yolo_aerial # 安装PyTorch(根据CUDA版本选择) pip install torch==2.0.0 torchvision==0.15.0 # 安装Ultralytics YOLOv8 pip install ultralytics # 安装其他依赖 pip install numpy opencv-python tqdm tensorboard pyyaml

4.2 改进的YOLOv8模型定义

创建models/improved_yolov8.py

python

import torch import torch.nn as nn from ultralytics.nn.modules import Conv, C2f, Detect, DFL from ultralytics.nn.tasks import BaseModel class ImprovedYOLOv8(nn.Module): def __init__(self, cfg='yolov8n.yaml', ch=3, nc=None): super().__init__() self.cfg = cfg self.nc = nc # 构建模型 self.model, self.save = parse_model(deepcopy(self.yaml), ch=ch) self.names = {i: f'class{i}' for i in range(nc)} if nc else None self.inplace = self.yaml.get('inplace', True) # 初始化权重 self.init_weights() def forward(self, x): y = [] for m in self.model: if m.f != -1: x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f] x = m(x) y.append(x if m.i in self.save else None) return x def init_weights(self): for m in self.modules(): if isinstance(m, nn.Conv2d): nn.init.kaiming_normal_(m.weight, mode='fan_out') if m.bias is not None: nn.init.constant_(m.bias, 0) def parse_model(d, ch): """解析YOLO模型配置""" import ast from ultralytics.nn.modules import (C1, C2, C3, C3TR, SPP, SPPF, Bottleneck, BottleneckCSP, C2f, C3Ghost, C3x, Classify, Concat, Conv, ConvTranspose, Detect, DWConv, DWConvTranspose2d, Focus, GhostBottleneck, GhostConv, HGBlock, HGStem, Pose, RepC3, RepConv, RTDETRDecoder, Segment) # 解析backbone和head backbone = d['backbone'] head = d['head'] layers = [] save = [] # 构建backbone for i, (f, n, m, args) in enumerate(backbone + head): m = getattr(torch.nn, m[3:]) if 'nn.' in m else globals()[m] for j, a in enumerate(args): if isinstance(a, str): args[j] = list(map(int, a.split(','))) if ',' in a else int(a) n = n_ = max(round(n * d.get('depth_multiple', 1.0)), 1) if n > 1 else n if m in [Conv, GhostConv, Bottleneck, GhostBottleneck, SPP, SPPF, DWConv, Focus, BottleneckCSP, C1, C2, C2f, C3, C3TR, C3Ghost, nn.ConvTranspose2d, DWConvTranspose2d, C3x, RepC3, RepConv]: c1, c2 = ch[f], args[0] if c2 != nc: c2 = make_divisible(c2 * d.get('width_multiple', 1.0), 8) args = [c1, c2, *args[1:]] if m in [C2f, C3, C3TR, C3Ghost, C3x]: args.insert(2, n) n = 1 elif m is nn.BatchNorm2d: args = [ch[f]] elif m is Concat: c2 = sum(ch[x] for x in f) elif m is Detect: args.append([ch[x] for x in f]) elif m is Segment: args.append([ch[x] for x in f]) args.append([ch[x] for x in f]) else: c2 = ch[f] m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args) t = str(m)[8:-2].replace('__main__.', '') m.np = sum(x.numel() for x in m_.parameters()) m_.i, m_.f, m_.type = i, f, t save.extend(x % i for x in ([f] if isinstance(f, int) else f) if x != -1) layers.append(m_) if i == 0: ch = [] ch.append(c2) return nn.Sequential(*layers), sorted(save)

4.3 改进Neck的具体实现

创建models/improved_neck.py

python

import torch import torch.nn as nn import torch.nn.functional as F class BiFPNBlock(nn.Module): """加权双向特征金字塔块""" def __init__(self, channels, num_levels=4): super().__init__() self.channels = channels self.num_levels = num_levels # 可学习权重 self.w_topdown = nn.Parameter(torch.ones(2, num_levels)) self.w_bottomup = nn.Parameter(torch.ones(3, num_levels - 2)) # 卷积层 self.conv_topdown = nn.ModuleList([ Conv(channels, channels, 3, 1, 1) for _ in range(num_levels) ]) self.conv_bottomup = nn.ModuleList([ Conv(channels, channels, 3, 1, 1) for _ in range(num_levels - 2) ]) def forward(self, feats): # 输入特征列表 [P3, P4, P5, P6] # 权重归一化 w_td = F.relu(self.w_topdown) w_td = w_td / (w_td.sum(dim=0, keepdim=True) + 1e-5) w_bu = F.relu(self.w_bottomup) w_bu = w_bu / (w_bu.sum(dim=0, keepdim=True) + 1e-5) # 自顶向下路径 td_feats = [feats[-1]] for i in range(self.num_levels - 2, -1, -1): # 上采样 up_feat = F.interpolate(td_feats[0], size=feats[i].shape[2:], mode='nearest') # 加权融合 fused = w_td[0, i] * feats[i] + w_td[1, i] * up_feat td_feats.insert(0, self.conv_topdown[i](fused)) # 自底向上路径 bu_feats = [td_feats[0]] for i in range(1, self.num_levels - 1): # 下采样 down_feat = F.max_pool2d(bu_feats[-1], 2) # 加权融合(包含原始特征、上阶段特征、下采样特征) fused = (w_bu[0, i-1] * td_feats[i] + w_bu[1, i-1] * down_feat + w_bu[2, i-1] * feats[i]) bu_feats.append(self.conv_bottomup[i-1](fused)) bu_feats.append(td_feats[-1]) return bu_feats class AFFModule(nn.Module): """自适应特征融合模块""" def __init__(self, channels, reduction=8): super().__init__() self.channels = channels # 全局注意力分支 self.gap = nn.AdaptiveAvgPool2d(1) self.global_fc = nn.Sequential( nn.Linear(channels, channels // reduction, bias=False), nn.ReLU(inplace=True), nn.Linear(channels // reduction, channels, bias=False), nn.Sigmoid() ) # 局部注意力分支 self.local_conv = nn.Sequential( nn.Conv2d(channels, channels // reduction, 1, bias=False), nn.BatchNorm2d(channels // reduction), nn.ReLU(inplace=True), nn.Conv2d(channels // reduction, channels, 1, bias=False), nn.Sigmoid() ) # 融合权重 self.fusion_weight = nn.Parameter(torch.ones(2)) def forward(self, low_feat, high_feat): B, C, H, W = low_feat.shape # 全局注意力(基于高层特征) global_att = self.gap(high_feat).view(B, C) global_att = self.global_fc(global_att).view(B, C, 1, 1) # 局部注意力(基于低层特征) local_att = self.local_conv(low_feat) # 融合注意力图 weight = torch.softmax(self.fusion_weight, dim=0) fused_att = weight[0] * global_att + weight[1] * local_att # 特征增强 enhanced_low = low_feat * fused_att enhanced_high = high_feat * (1 - fused_att) return enhanced_low + enhanced_high class ASPPModule(nn.Module): """空洞空间金字塔池化""" def __init__(self, in_channels, out_channels, rates=[6, 12, 18]): super().__init__() # 1x1卷积分支 self.branch1 = nn.Sequential( nn.Conv2d(in_channels, out_channels, 1, bias=False), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True) ) # 空洞卷积分支 self.branch2 = nn.Sequential( nn.Conv2d(in_channels, out_channels, 3, padding=rates[0], dilation=rates[0], bias=False), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True) ) self.branch3 = nn.Sequential( nn.Conv2d(in_channels, out_channels, 3, padding=rates[1], dilation=rates[1], bias=False), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True) ) self.branch4 = nn.Sequential( nn.Conv2d(in_channels, out_channels, 3, padding=rates[2], dilation=rates[2], bias=False), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True) ) # 全局平均池化分支 self.branch5 = nn.Sequential( nn.AdaptiveAvgPool2d(1), nn.Conv2d(in_channels, out_channels, 1, bias=False), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True) ) # 融合卷积 self.fusion = nn.Sequential( nn.Conv2d(out_channels * 5, out_channels, 1, bias=False), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True), nn.Conv2d(out_channels, out_channels, 3, padding=1, bias=False), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True) ) def forward(self, x): size = x.shape[2:] # 计算各分支 out1 = self.branch1(x) out2 = self.branch2(x) out3 = self.branch3(x) out4 = self.branch4(x) out5 = self.branch5(x) out5 = F.interpolate(out5, size=size, mode='bilinear', align_corners=False) # 拼接融合 out = torch.cat([out1, out2, out3, out4, out5], dim=1) out = self.fusion(out) return out class ImprovedNeck(nn.Module): """改进的特征金字塔Neck""" def __init__(self, channels=[256, 512, 768, 768], num_levels=4): super().__init__() self.num_levels = num_levels # 特征投影层(统一通道数) self.proj_convs = nn.ModuleList([ nn.Conv2d(channels[i], channels[0], 1) for i in range(1, len(channels)) ]) # BiFPN层 self.bifpn1 = BiFPNBlock(channels[0], num_levels) self.bifpn2 = BiFPNBlock(channels[0], num_levels) # AFF模块 self.aff_modules = nn.ModuleList([ AFFModule(channels[0]) for _ in range(num_levels - 1) ]) # ASPP模块 self.aspp = ASPPModule(channels[0], channels[0]) # 恢复通道数 self.restore_convs = nn.ModuleList([ nn.Conv2d(channels[0], channels[i], 1) for i in range(1, len(channels)) ]) def forward(self, feats): # feats: [P3, P4, P5, P6] from backbone # 1. 统一通道数 proj_feats = [feats[0]] for i, feat in enumerate(feats[1:]): proj_feats.append(self.proj_convs[i](feat)) # 2. 第一次BiFPN融合 bifpn_out1 = self.bifpn1(proj_feats) # 3. AFF特征增强 aff_out = [] for i in range(self.num_levels - 1): aff_out.append(self.aff_modules[i](bifpn_out1[i], bifpn_out1[i+1])) aff_out.append(bifpn_out1[-1]) # 4. 最深特征应用ASPP aff_out[-1] = self.aspp(aff_out[-1]) + aff_out[-1] # 残差连接 # 5. 第二次BiFPN融合 bifpn_out2 = self.bifpn2(aff_out) # 6. 恢复原始通道数 final_feats = [bifpn_out2[0]] for i, feat in enumerate(bifpn_out2[1:]): final_feats.append(self.restore_convs[i](feat)) return final_feats

4.4 训练脚本

创建train.py

python

import argparse import os import sys import torch import yaml from pathlib import Path from ultralytics import YOLO from ultralytics.utils import LOGGER, colorstr from ultralytics.utils.torch_utils import select_device def train_improved_yolo(): parser = argparse.ArgumentParser() parser.add_argument('--data', type=str, default='VisDrone.yaml', help='dataset path') parser.add_argument('--weights', type=str, default='yolov8n.pt', help='initial weights path') parser.add_argument('--epochs', type=int, default=300, help='total training epochs') parser.add_argument('--batch-size', type=int, default=16, help='total batch size') parser.add_argument('--imgsz', type=int, default=640, help='image size') parser.add_argument('--device', type=str, default='', help='cuda device') parser.add_argument('--project', type=str, default='runs/train', help='project name') parser.add_argument('--name', type=str, default='exp', help='experiment name') args = parser.parse_args() # 加载预训练模型 model = YOLO(args.weights) # 替换Neck为改进版本(需要修改模型配置文件) # 这里通过自定义yaml文件实现 model = YOLO('models/improved_yolov8.yaml') # 训练参数 results = model.train( data=args.data, epochs=args.epochs, batch=args.batch_size, imgsz=args.imgsz, device=args.device, project=args.project, name=args.name, # 数据增强参数 hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, # 优化器参数 optimizer='SGD', lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3, warmup_momentum=0.8, warmup_bias_lr=0.1, # 损失函数参数 box=7.5, cls=0.5, dfl=1.5, # 训练技巧 label_smoothing=0.0, nbs=64, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, plots=True, ) return results if __name__ == '__main__': train_improved_yolo()

4.5 验证脚本

创建val.py

python

import argparse from ultralytics import YOLO def validate(): parser = argparse.ArgumentParser() parser.add_argument('--weights', type=str, required=True, help='model path') parser.add_argument('--data', type=str, default='VisDrone.yaml', help='dataset path') parser.add_argument('--batch-size', type=int, default=16, help='batch size') parser.add_argument('--imgsz', type=int, default=640, help='image size') parser.add_argument('--device', type=str, default='', help='cuda device') args = parser.parse_args() model = YOLO(args.weights) metrics = model.val( data=args.data, batch=args.batch_size, imgsz=args.imgsz, device=args.device, plots=True, save_json=True, save_hybrid=True, ) print(f"mAP50: {metrics.box.map50:.4f}") print(f"mAP50-95: {metrics.box.map:.4f}") if __name__ == '__main__': validate()

5. 实验设置与数据集

5.1 数据集介绍

5.1.1 VisDrone数据集

VisDrone是由天津大学机器学习和数据挖掘实验室收集的大规模航拍数据集,包含288个视频片段和10,209张静态图像,覆盖中国14个不同城市的不同场景。标注类别包括:

  • 行人 (pedestrian)

  • 人群 (people)

  • 自行车 (bicycle)

  • 汽车 (car)

  • 货车 (van)

  • 卡车 (truck)

  • 三轮车 (tricycle)

  • 遮阳三轮车 (awning-tricycle)

  • 公交车 (bus)

  • 摩托车 (motor)

数据集特点:

  • 图像分辨率:2000x1500

  • 目标尺度:大量小目标,部分目标仅10x10像素

  • 场景多样性:城市、乡村、公路、广场等

数据集下载

bash

# 官方下载地址 https://github.com/VisDrone/VisDrone-Dataset # 使用脚本下载 wget https://download.visdrone.org/VisDrone2019-DET-train.zip wget https://download.visdrone.org/VisDrone2019-DET-val.zip wget https://download.visdrone.org/VisDrone2019-DET-test-dev.zip
5.1.2 DIOR数据集

DIOR是一个用于光学遥感图像目标检测的大规模数据集,包含23,463张图像和192,472个实例,覆盖20个类别:

  • 飞机 (airplane)

  • 机场 (airport)

  • 棒球场 (baseballfield)

  • 篮球场 (basketballcourt)

  • 桥梁 (bridge)

  • 烟囱 (chimney)

  • 水坝 (dam)

  • 高速公路服务区 (Expressway-toll-station)

  • 高速公路收费站 (Expressway-toll-station)

  • 足球场 (footballfield)

  • 立交桥 (overpass)

  • 港口 (port)

  • 加油站 (stadium)

  • 储罐 (storagetank)

  • 网球场 (tenniscourt)

  • 火车站 (trainstation)

  • 车辆 (vehicle)

  • 风车 (windmill)

数据集特点:

  • 图像分辨率:800x800

  • 场景变化:大尺度变化,不同季节、天气条件

  • 目标多样性:包含人造和自然目标

5.2 数据集预处理

将数据集转换为YOLO格式:

python

# prepare_dataset.py import os import cv2 import numpy as np from tqdm import tqdm def convert_visdrone_to_yolo(visdrone_path, output_path): """将VisDrone标注转换为YOLO格式""" img_dir = os.path.join(visdrone_path, 'images') ann_dir = os.path.join(visdrone_path, 'annotations') os.makedirs(os.path.join(output_path, 'images'), exist_ok=True) os.makedirs(os.path.join(output_path, 'labels'), exist_ok=True) # 类别映射(VisDrone原始类别) # 0: ignored, 1: pedestrian, 2: people, 3: bicycle, 4: car # 5: van, 6: truck, 7: tricycle, 8: awning-tricycle, 9: bus, 10: motor # 11: others valid_classes = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] # 忽略0和11 class_map = {cls: idx for idx, cls in enumerate(valid_classes)} ann_files = [f for f in os.listdir(ann_dir) if f.endswith('.txt')] for ann_file in tqdm(ann_files, desc='Converting'): img_file = ann_file.replace('.txt', '.jpg') img_path = os.path.join(img_dir, img_file) if not os.path.exists(img_path): continue # 读取图像获取尺寸 img = cv2.imread(img_path) if img is None: continue h, w = img.shape[:2] # 读取标注 with open(os.path.join(ann_dir, ann_file), 'r') as f: lines = f.readlines() yolo_labels = [] for line in lines: parts = line.strip().split(',') if len(parts) < 8: continue obj_class = int(parts[5]) if obj_class not in valid_classes: continue # 边界框坐标 [x, y, w, h] bbox = [int(parts[0]), int(parts[1]), int(parts[2]), int(parts[3])] # 转换为YOLO格式 [class_id, x_center, y_center, width, height] (归一化) x_center = (bbox[0] + bbox[2] / 2) / w y_center = (bbox[1] + bbox[3] / 2) / h width = bbox[2] / w height = bbox[3] / h # 确保坐标在[0,1]范围内 x_center = max(0, min(1, x_center)) y_center = max(0, min(1, y_center)) width = max(0, min(1, width)) height = max(0, min(1, height)) yolo_labels.append(f"{class_map[obj_class]} {x_center:.6f} {y_center:.6f} {width:.6f} {height:.6f}") # 保存图像和标注 cv2.imwrite(os.path.join(output_path, 'images', img_file), img) with open(os.path.join(output_path, 'labels', ann_file), 'w') as f: f.write('\n'.join(yolo_labels)) if __name__ == '__main__': convert_visdrone_to_yolo('VisDrone2019-DET-train', 'VisDrone_YOLO/train') convert_visdrone_to_yolo('VisDrone2019-DET-val', 'VisDrone_YOLO/val')

5.3 数据集配置文件

创建VisDrone.yaml

yaml

# VisDrone数据集配置文件 path: ./VisDrone_YOLO # 数据集根目录 train: images/train # 训练图像路径 val: images/val # 验证图像路径 test: images/test # 测试图像路径 # 类别数 nc: 10 # 类别名称 names: 0: pedestrian 1: people 2: bicycle 3: car 4: van 5: truck 6: tricycle 7: awning-tricycle 8: bus 9: motor

创建DIOR.yaml

yaml

# DIOR数据集配置文件 path: ./DIOR_YOLO train: images/train val: images/val test: images/test nc: 20 names: 0: airplane 1: airport 2: baseballfield 3: basketballcourt 4: bridge 5: chimney 6: dam 7: Expressway-Service-area 8: Expressway-toll-station 9: golffield 10: groundtrackfield 11: harbor 12: overpass 13: ship 14: stadium 15: storagetank 16: tenniscourt 17: trainstation 18: vehicle 19: windmill
http://www.jsqmd.com/news/455447/

相关文章:

  • EEGNet实战:用Python和MNE库快速搭建脑电信号分类模型(附完整代码)
  • 深入解析ChatGPT GPTs架构设计与实现原理
  • RK3588实战:如何用yolov5_demo实现视频流目标检测(附完整代码解析)
  • Qwen2.5-VL-7B-Instruct表格处理能力展示:从PDF表格到结构化数据
  • 门禁系统故障排查大全:从读卡失灵到锁体异常的7种现场解决方案
  • 黑马点评——部分代码分析
  • 2026 小程序商城 SaaS 模板开发全攻略:入门到精通平台推荐 - 企业数字化改造和转型
  • 光伏传感器供应商优选:2026年这些品牌不容错过,电压传感器/电压互感器/电流传感器/传感器,传感器批发找哪家 - 品牌推荐师
  • 72小时竞标AI效果图到底有没有用
  • 2026光伏传感器市场新风向:这些厂家值得一看,漏电传感器/电压传感器/传感器/互感器/电流传感器,传感器定制有哪些 - 品牌推荐师
  • 20260309 模拟测 总结
  • 光伏行业传感器供应商口碑盘点,优选品牌推荐,漏电互感器/互感器/开口互感器/电压传感器/传感器,传感器设计排行 - 品牌推荐师
  • 小白友好:RexUniNLU快速部署指南,开箱即用的中文NLP全能工具箱
  • RexUniNLU模型安全部署指南
  • 42.接雨水
  • 如何通过微信社交优化工具实现数字社交断舍离
  • 医疗AI入门首选:MedGemma X-Ray系统部署与使用完整指南
  • 本地渲染革命:浏览器端3D纹理生成工具NormalMap-Online全解析
  • 如何通过教学环境优化工具提升学习效率?JiYuTrainer技术方案解析
  • Qwen3-ASR-0.6B模型服务化中的网络通信原理与优化
  • ASP.NET Core异步优化实战:ConfigureAwait(false)在服务端的最佳实践
  • Java 25结构化并发实战:手把手带你在Spring Boot 3.4中集成StructuredTaskScope,30分钟搞定异步编排与统一异常熔断
  • AIGlasses OS Pro 数据库课程设计案例:智能相册管理系统的设计与实现
  • StructBERT模型一键部署教程:基于Ubuntu20.04与Docker环境
  • HY-Motion 1.0模型安全:对抗样本防御策略
  • 技术写作新手必看:如何选择最适合你的技术投稿平台(2024最新版)
  • 5步搞定灵毓秀-牧神-造相Z-Turbo打包:制作可离线运行的AI绘画工具
  • 电子设计实战:如何用NPN和PNP三极管搭建一个简单的开关电路(附电路图)
  • PHPStudy Pro 8.1 + Sqli-labs 靶场搭建全攻略:解决PHP7+版本兼容性问题
  • 基于YOLOv8鹰眼目标检测的智慧园区应用:人员与车辆出入智能监控