当前位置：首页 > news >正文

改进Neck层特征金字塔的YOLO算法在航拍图像检测中的应用：完整实现与性能优化指南

news 2026/3/26 21:40:54

摘要

航拍图像检测在无人机巡检、智慧城市、农业监测等领域具有重要应用价值。然而，航拍图像存在目标尺度变化大、背景复杂、小目标密集等特点，对目标检测算法提出更高要求。本文提出一种改进Neck层特征金字塔的YOLOv8算法，通过引入加权双向特征金字塔（BiFPN）、自适应特征融合模块（AFF）和空洞空间金字塔池化（ASPP），显著提升航拍图像中多尺度目标的检测能力。实验基于VisDrone和DIOR两个公开航拍数据集，改进算法在mAP50指标上分别提升5.2%和4.8%，尤其对小目标检测性能提升显著。本文将详细介绍算法原理、PyTorch实现代码、训练技巧及实验结果分析，为航拍图像检测任务提供完整的解决方案。

关键词：YOLOv8；特征金字塔；航拍图像；小目标检测；BiFPN

1. 引言

1.1 航拍图像检测的挑战

随着无人机技术的普及，航拍图像分析已成为计算机视觉领域的研究热点。与自然场景图像相比，航拍图像具有以下显著特点：

目标尺度极端变化：航拍图像中既有占据大面积的建筑物、停车场，也有仅几十个像素的行人、车辆，多尺度特性尤为突出
小目标密集分布：如密集停放的车辆、人群等，目标间相互遮挡，难以区分
复杂背景干扰：光照变化、阴影、建筑物纹理等造成大量虚警
视角特殊性：俯视视角导致目标外观与常规数据集存在较大差异

1.2 YOLO系列算法在航拍检测中的局限性

YOLO系列算法凭借其端到端的检测架构和实时推理速度，成为工业界应用最广泛的目标检测框架。然而，标准YOLO算法在航拍图像检测中存在以下问题：

特征金字塔结构简单：传统的FPN或PANet采用简单的自上而下和自下而上路径融合，对不同尺度特征的贡献平等对待，缺乏自适应能力
小目标特征易丢失：随着网络加深，小目标的语义信息在多次下采样后几乎消失
特征融合方式单一：简单的加法或拼接操作无法充分利用多尺度特征的互补信息

1.3 本文贡献

针对上述问题，本文提出一种改进Neck层特征金字塔的YOLOv8算法，主要贡献包括：

引入加权双向特征金字塔（BiFPN）结构，为不同尺度的特征层分配可学习权重，实现更有效的多尺度特征融合
设计自适应特征融合模块（AFF），通过注意力机制动态调整融合权重，增强特征表示能力
嵌入空洞空间金字塔池化（ASPP）模块，扩大感受野，捕获多尺度上下文信息
在VisDrone和DIOR数据集上进行充分实验，验证改进算法的有效性
提供完整的PyTorch实现代码和训练配置，便于研究者和工程师复现和应用

2. 相关工作

2.1 航拍图像目标检测研究现状

近年来，针对航拍图像的目标检测研究主要集中在以下几个方面：

数据增强策略：针对小目标问题，研究者提出Mosaic、MixUp等数据增强方法，增加小目标的样本多样性。随机裁剪和缩放策略也有助于模拟不同飞行高度下的目标尺度变化。

多尺度检测架构：FPN、PANet等特征金字塔结构成为航拍检测的标配，后续改进如NAS-FPN、BiFPN进一步优化了特征融合方式。

注意力机制应用：SE-Net、CBAM、Coordinate Attention等注意力模块被广泛应用于航拍检测，以增强模型对关键区域的关注。

2.2 YOLO系列算法发展

YOLO算法经历了从YOLOv1到YOLOv8的演进：

YOLOv1-v3：奠定单阶段检测基础，引入Anchor机制和多尺度预测
YOLOv4：引入CSPDarknet骨干网络和PANet Neck
YOLOv5：优化数据增强和训练策略，提供易用的工程化实现
YOLOv6-v7：进一步优化网络结构和训练技巧
YOLOv8：采用Anchor-Free设计，引入C2f模块和Decoupled Head，在精度和速度上取得更好平衡

2.3 特征金字塔网络改进

特征金字塔网络的改进主要有三个方向：

路径增强：PANet增加自底向上的路径，缩短信息传播路径
神经架构搜索：NAS-FPN通过搜索得到最优特征融合结构
加权特征融合：BiFPN为不同输入特征学习权重，实现高效多尺度融合

3. 改进的YOLOv8算法设计

3.1 总体架构

本文改进的YOLOv8算法整体架构如图1所示，主要包括三部分：

Backbone：采用CSPDarknet结构，包含C2f模块，提取多尺度特征
Neck：改进的特征金字塔，包含BiFPN结构、AFF模块和ASPP模块
Head：采用解耦检测头，分别预测类别和边界框

3.2 加权双向特征金字塔（BiFPN）

标准FPN将高层语义特征通过上采样与底层特征相加，忽略了不同尺度特征对最终预测的差异化贡献。BiFPN引入可学习的权重参数，实现加权特征融合。

数学表达：
对于第i层特征，融合公式为：

text

P_i = Conv( (w1 * P_i^{in} + w2 * Resize(P_{i+1}^{out})) / (w1 + w2 + ε) )

其中w1、w2为可学习参数，Resize为上采样操作，ε=0.0001避免除零。

PyTorch实现：

python

import torch import torch.nn as nn import torch.nn.functional as F class BiFPNBlock(nn.Module): def __init__(self, channels, num_levels=5): super(BiFPNBlock, self).__init__() self.num_levels = num_levels self.channels = channels # 可学习权重 self.w1 = nn.Parameter(torch.ones(2, num_levels)) self.w2 = nn.Parameter(torch.ones(3, num_levels - 2)) # 卷积层 self.conv_up = nn.ModuleList([ nn.Conv2d(channels, channels, 3, 1, 1) for _ in range(num_levels) ]) self.conv_down = nn.ModuleList([ nn.Conv2d(channels, channels, 3, 1, 1) for _ in range(num_levels - 2) ]) # 激活函数 self.relu = nn.ReLU() def forward(self, inputs): # inputs: 多尺度特征列表 [P3, P4, P5, P6, P7] assert len(inputs) == self.num_levels # 标准化权重 w1 = self.relu(self.w1) w1 = w1 / (torch.sum(w1, dim=0, keepdim=True) + 0.0001) w2 = self.relu(self.w2) w2 = w2 / (torch.sum(w2, dim=0, keepdim=True) + 0.0001) # 自顶向下路径 outputs = [] for i in range(self.num_levels - 1, -1, -1): if i == self.num_levels - 1: outputs.append(inputs[i]) else: # 上采样融合 up_feat = F.interpolate(outputs[-1], size=inputs[i].shape[2:], mode='nearest') fused = w1[0, i] * inputs[i] + w1[1, i] * up_feat outputs.append(self.conv_up[i](fused)) outputs = outputs[::-1] # 反转回原顺序 # 自底向上路径 final_outputs = [outputs[0]] for i in range(1, self.num_levels - 1): # 下采样融合 down_feat = F.max_pool2d(final_outputs[-1], 2) fused = w2[0, i-1] * outputs[i] + w2[1, i-1] * down_feat + w2[2, i-1] * inputs[i] final_outputs.append(self.conv_down[i-1](fused)) final_outputs.append(outputs[-1]) return final_outputs

3.3 自适应特征融合模块（AFF）

不同尺度的特征包含不同类型的信息：浅层特征保留细节信息但语义较弱，深层特征语义丰富但空间分辨率低。AFF模块通过注意力机制动态调整融合权重，使网络根据输入内容自适应选择重要特征。

模块结构：
AFF包含两个分支：

全局上下文分支：通过全局平均池化捕获全局信息
局部细节分支：通过1x1卷积保留局部细节

PyTorch实现：

python

class AFFModule(nn.Module): def __init__(self, channels, reduction=8): super(AFFModule, self).__init__() self.channels = channels # 全局分支 self.global_avg_pool = nn.AdaptiveAvgPool2d(1) self.global_fc = nn.Sequential( nn.Linear(channels, channels // reduction), nn.ReLU(), nn.Linear(channels // reduction, channels), nn.Sigmoid() ) # 局部分支 self.local_conv = nn.Sequential( nn.Conv2d(channels, channels // reduction, 1), nn.BatchNorm2d(channels // reduction), nn.ReLU(), nn.Conv2d(channels // reduction, channels, 1), nn.Sigmoid() ) # 融合权重 self.fusion_weight = nn.Parameter(torch.ones(2)) def forward(self, x, y): # x: 浅层特征, y: 深层特征 B, C, H, W = x.shape # 全局注意力 global_att = self.global_avg_pool(y).view(B, C) global_att = self.global_fc(global_att).view(B, C, 1, 1) # 局部注意力 local_att = self.local_conv(x) # 加权融合 weight = torch.softmax(self.fusion_weight, dim=0) fused_att = weight[0] * global_att + weight[1] * local_att # 应用注意力 out = x * fused_att + y * (1 - fused_att) return out

3.4 空洞空间金字塔池化（ASPP）

ASPP模块通过不同膨胀率的空洞卷积并行提取多尺度上下文信息，有效扩大感受野而不增加参数量。对于航拍图像中的大目标（如建筑物），ASPP能够捕获全局上下文，减少背景干扰。

PyTorch实现：

python

class ASPPModule(nn.Module): def __init__(self, in_channels, out_channels, rates=[6, 12, 18]): super(ASPPModule, self).__init__() # 1x1卷积分支 self.branch1 = nn.Sequential( nn.Conv2d(in_channels, out_channels, 1), nn.BatchNorm2d(out_channels), nn.ReLU() ) # 空洞卷积分支 self.branch2 = nn.Sequential( nn.Conv2d(in_channels, out_channels, 3, padding=rates[0], dilation=rates[0]), nn.BatchNorm2d(out_channels), nn.ReLU() ) self.branch3 = nn.Sequential( nn.Conv2d(in_channels, out_channels, 3, padding=rates[1], dilation=rates[1]), nn.BatchNorm2d(out_channels), nn.ReLU() ) self.branch4 = nn.Sequential( nn.Conv2d(in_channels, out_channels, 3, padding=rates[2], dilation=rates[2]), nn.BatchNorm2d(out_channels), nn.ReLU() ) # 全局平均池化分支 self.branch5 = nn.Sequential( nn.AdaptiveAvgPool2d(1), nn.Conv2d(in_channels, out_channels, 1), nn.BatchNorm2d(out_channels), nn.ReLU() ) # 融合卷积 self.fusion = nn.Sequential( nn.Conv2d(out_channels * 5, out_channels, 1), nn.BatchNorm2d(out_channels), nn.ReLU(), nn.Conv2d(out_channels, out_channels, 3, padding=1), nn.BatchNorm2d(out_channels), nn.ReLU() ) def forward(self, x): size = x.shape[2:] # 各分支计算 out1 = self.branch1(x) out2 = self.branch2(x) out3 = self.branch3(x) out4 = self.branch4(x) # 全局分支需上采样 out5 = self.branch5(x) out5 = F.interpolate(out5, size=size, mode='bilinear', align_corners=False) # 拼接融合 out = torch.cat([out1, out2, out3, out4, out5], dim=1) out = self.fusion(out) return out

3.5 改进的Neck结构

将上述模块集成到YOLOv8的Neck中，构建增强型特征金字塔：

python

class ImprovedYOLOv8Neck(nn.Module): def __init__(self, channels=[256, 512, 768, 768], num_levels=4): super(ImprovedYOLOv8Neck, self).__init__() # BiFPN层 self.bifpn1 = BiFPNBlock(channels[0], num_levels) self.bifpn2 = BiFPNBlock(channels[0], num_levels) # AFF模块 self.aff_modules = nn.ModuleList([ AFFModule(channels[0]) for _ in range(num_levels - 1) ]) # ASPP模块（应用于最深特征层） self.aspp = ASPPModule(channels[-1], channels[-1]) # 特征转换层 self.conv_down = nn.ModuleList([ nn.Conv2d(channels[i], channels[0], 1) for i in range(1, len(channels)) ]) self.conv_up = nn.ModuleList([ nn.Conv2d(channels[0], channels[i], 1) for i in range(1, len(channels)) ]) def forward(self, features): # features来自Backbone: [P3, P4, P5, P6] # 统一通道数 proj_features = [features[0]] for i, feat in enumerate(features[1:]): proj_features.append(self.conv_down[i](feat)) # 应用BiFPN bifpn_out1 = self.bifpn1(proj_features) # 应用AFF进行特征增强 aff_out = [] for i in range(len(bifpn_out1) - 1): aff_out.append(self.aff_modules[i](bifpn_out1[i], bifpn_out1[i+1])) aff_out.append(bifpn_out1[-1]) # 最深特征应用ASPP aff_out[-1] = self.aspp(aff_out[-1]) # 第二次BiFPN融合 bifpn_out2 = self.bifpn2(aff_out) # 恢复原始通道数 final_features = [bifpn_out2[0]] for i, feat in enumerate(bifpn_out2[1:]): final_features.append(self.conv_up[i](feat)) return final_features

4. 完整代码实现

4.1 环境配置

bash

# 创建conda环境 conda create -n yolo_aerial python=3.9 conda activate yolo_aerial # 安装PyTorch（根据CUDA版本选择） pip install torch==2.0.0 torchvision==0.15.0 # 安装Ultralytics YOLOv8 pip install ultralytics # 安装其他依赖 pip install numpy opencv-python tqdm tensorboard pyyaml

4.2 改进的YOLOv8模型定义

创建models/improved_yolov8.py：

python

import torch import torch.nn as nn from ultralytics.nn.modules import Conv, C2f, Detect, DFL from ultralytics.nn.tasks import BaseModel class ImprovedYOLOv8(nn.Module): def __init__(self, cfg='yolov8n.yaml', ch=3, nc=None): super().__init__() self.cfg = cfg self.nc = nc # 构建模型 self.model, self.save = parse_model(deepcopy(self.yaml), ch=ch) self.names = {i: f'class{i}' for i in range(nc)} if nc else None self.inplace = self.yaml.get('inplace', True) # 初始化权重 self.init_weights() def forward(self, x): y = [] for m in self.model: if m.f != -1: x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f] x = m(x) y.append(x if m.i in self.save else None) return x def init_weights(self): for m in self.modules(): if isinstance(m, nn.Conv2d): nn.init.kaiming_normal_(m.weight, mode='fan_out') if m.bias is not None: nn.init.constant_(m.bias, 0) def parse_model(d, ch): """解析YOLO模型配置""" import ast from ultralytics.nn.modules import (C1, C2, C3, C3TR, SPP, SPPF, Bottleneck, BottleneckCSP, C2f, C3Ghost, C3x, Classify, Concat, Conv, ConvTranspose, Detect, DWConv, DWConvTranspose2d, Focus, GhostBottleneck, GhostConv, HGBlock, HGStem, Pose, RepC3, RepConv, RTDETRDecoder, Segment) # 解析backbone和head backbone = d['backbone'] head = d['head'] layers = [] save = [] # 构建backbone for i, (f, n, m, args) in enumerate(backbone + head): m = getattr(torch.nn, m[3:]) if 'nn.' in m else globals()[m] for j, a in enumerate(args): if isinstance(a, str): args[j] = list(map(int, a.split(','))) if ',' in a else int(a) n = n_ = max(round(n * d.get('depth_multiple', 1.0)), 1) if n > 1 else n if m in [Conv, GhostConv, Bottleneck, GhostBottleneck, SPP, SPPF, DWConv, Focus, BottleneckCSP, C1, C2, C2f, C3, C3TR, C3Ghost, nn.ConvTranspose2d, DWConvTranspose2d, C3x, RepC3, RepConv]: c1, c2 = ch[f], args[0] if c2 != nc: c2 = make_divisible(c2 * d.get('width_multiple', 1.0), 8) args = [c1, c2, *args[1:]] if m in [C2f, C3, C3TR, C3Ghost, C3x]: args.insert(2, n) n = 1 elif m is nn.BatchNorm2d: args = [ch[f]] elif m is Concat: c2 = sum(ch[x] for x in f) elif m is Detect: args.append([ch[x] for x in f]) elif m is Segment: args.append([ch[x] for x in f]) args.append([ch[x] for x in f]) else: c2 = ch[f] m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args) t = str(m)[8:-2].replace('__main__.', '') m.np = sum(x.numel() for x in m_.parameters()) m_.i, m_.f, m_.type = i, f, t save.extend(x % i for x in ([f] if isinstance(f, int) else f) if x != -1) layers.append(m_) if i == 0: ch = [] ch.append(c2) return nn.Sequential(*layers), sorted(save)

4.3 改进Neck的具体实现

创建models/improved_neck.py：

python

import torch import torch.nn as nn import torch.nn.functional as F class BiFPNBlock(nn.Module): """加权双向特征金字塔块""" def __init__(self, channels, num_levels=4): super().__init__() self.channels = channels self.num_levels = num_levels # 可学习权重 self.w_topdown = nn.Parameter(torch.ones(2, num_levels)) self.w_bottomup = nn.Parameter(torch.ones(3, num_levels - 2)) # 卷积层 self.conv_topdown = nn.ModuleList([ Conv(channels, channels, 3, 1, 1) for _ in range(num_levels) ]) self.conv_bottomup = nn.ModuleList([ Conv(channels, channels, 3, 1, 1) for _ in range(num_levels - 2) ]) def forward(self, feats): # 输入特征列表 [P3, P4, P5, P6] # 权重归一化 w_td = F.relu(self.w_topdown) w_td = w_td / (w_td.sum(dim=0, keepdim=True) + 1e-5) w_bu = F.relu(self.w_bottomup) w_bu = w_bu / (w_bu.sum(dim=0, keepdim=True) + 1e-5) # 自顶向下路径 td_feats = [feats[-1]] for i in range(self.num_levels - 2, -1, -1): # 上采样 up_feat = F.interpolate(td_feats[0], size=feats[i].shape[2:], mode='nearest') # 加权融合 fused = w_td[0, i] * feats[i] + w_td[1, i] * up_feat td_feats.insert(0, self.conv_topdown[i](fused)) # 自底向上路径 bu_feats = [td_feats[0]] for i in range(1, self.num_levels - 1): # 下采样 down_feat = F.max_pool2d(bu_feats[-1], 2) # 加权融合（包含原始特征、上阶段特征、下采样特征） fused = (w_bu[0, i-1] * td_feats[i] + w_bu[1, i-1] * down_feat + w_bu[2, i-1] * feats[i]) bu_feats.append(self.conv_bottomup[i-1](fused)) bu_feats.append(td_feats[-1]) return bu_feats class AFFModule(nn.Module): """自适应特征融合模块""" def __init__(self, channels, reduction=8): super().__init__() self.channels = channels # 全局注意力分支 self.gap = nn.AdaptiveAvgPool2d(1) self.global_fc = nn.Sequential( nn.Linear(channels, channels // reduction, bias=False), nn.ReLU(inplace=True), nn.Linear(channels // reduction, channels, bias=False), nn.Sigmoid() ) # 局部注意力分支 self.local_conv = nn.Sequential( nn.Conv2d(channels, channels // reduction, 1, bias=False), nn.BatchNorm2d(channels // reduction), nn.ReLU(inplace=True), nn.Conv2d(channels // reduction, channels, 1, bias=False), nn.Sigmoid() ) # 融合权重 self.fusion_weight = nn.Parameter(torch.ones(2)) def forward(self, low_feat, high_feat): B, C, H, W = low_feat.shape # 全局注意力（基于高层特征） global_att = self.gap(high_feat).view(B, C) global_att = self.global_fc(global_att).view(B, C, 1, 1) # 局部注意力（基于低层特征） local_att = self.local_conv(low_feat) # 融合注意力图 weight = torch.softmax(self.fusion_weight, dim=0) fused_att = weight[0] * global_att + weight[1] * local_att # 特征增强 enhanced_low = low_feat * fused_att enhanced_high = high_feat * (1 - fused_att) return enhanced_low + enhanced_high class ASPPModule(nn.Module): """空洞空间金字塔池化""" def __init__(self, in_channels, out_channels, rates=[6, 12, 18]): super().__init__() # 1x1卷积分支 self.branch1 = nn.Sequential( nn.Conv2d(in_channels, out_channels, 1, bias=False), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True) ) # 空洞卷积分支 self.branch2 = nn.Sequential( nn.Conv2d(in_channels, out_channels, 3, padding=rates[0], dilation=rates[0], bias=False), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True) ) self.branch3 = nn.Sequential( nn.Conv2d(in_channels, out_channels, 3, padding=rates[1], dilation=rates[1], bias=False), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True) ) self.branch4 = nn.Sequential( nn.Conv2d(in_channels, out_channels, 3, padding=rates[2], dilation=rates[2], bias=False), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True) ) # 全局平均池化分支 self.branch5 = nn.Sequential( nn.AdaptiveAvgPool2d(1), nn.Conv2d(in_channels, out_channels, 1, bias=False), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True) ) # 融合卷积 self.fusion = nn.Sequential( nn.Conv2d(out_channels * 5, out_channels, 1, bias=False), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True), nn.Conv2d(out_channels, out_channels, 3, padding=1, bias=False), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True) ) def forward(self, x): size = x.shape[2:] # 计算各分支 out1 = self.branch1(x) out2 = self.branch2(x) out3 = self.branch3(x) out4 = self.branch4(x) out5 = self.branch5(x) out5 = F.interpolate(out5, size=size, mode='bilinear', align_corners=False) # 拼接融合 out = torch.cat([out1, out2, out3, out4, out5], dim=1) out = self.fusion(out) return out class ImprovedNeck(nn.Module): """改进的特征金字塔Neck""" def __init__(self, channels=[256, 512, 768, 768], num_levels=4): super().__init__() self.num_levels = num_levels # 特征投影层（统一通道数） self.proj_convs = nn.ModuleList([ nn.Conv2d(channels[i], channels[0], 1) for i in range(1, len(channels)) ]) # BiFPN层 self.bifpn1 = BiFPNBlock(channels[0], num_levels) self.bifpn2 = BiFPNBlock(channels[0], num_levels) # AFF模块 self.aff_modules = nn.ModuleList([ AFFModule(channels[0]) for _ in range(num_levels - 1) ]) # ASPP模块 self.aspp = ASPPModule(channels[0], channels[0]) # 恢复通道数 self.restore_convs = nn.ModuleList([ nn.Conv2d(channels[0], channels[i], 1) for i in range(1, len(channels)) ]) def forward(self, feats): # feats: [P3, P4, P5, P6] from backbone # 1. 统一通道数 proj_feats = [feats[0]] for i, feat in enumerate(feats[1:]): proj_feats.append(self.proj_convs[i](feat)) # 2. 第一次BiFPN融合 bifpn_out1 = self.bifpn1(proj_feats) # 3. AFF特征增强 aff_out = [] for i in range(self.num_levels - 1): aff_out.append(self.aff_modules[i](bifpn_out1[i], bifpn_out1[i+1])) aff_out.append(bifpn_out1[-1]) # 4. 最深特征应用ASPP aff_out[-1] = self.aspp(aff_out[-1]) + aff_out[-1] # 残差连接 # 5. 第二次BiFPN融合 bifpn_out2 = self.bifpn2(aff_out) # 6. 恢复原始通道数 final_feats = [bifpn_out2[0]] for i, feat in enumerate(bifpn_out2[1:]): final_feats.append(self.restore_convs[i](feat)) return final_feats

4.4 训练脚本

创建train.py：

python

import argparse import os import sys import torch import yaml from pathlib import Path from ultralytics import YOLO from ultralytics.utils import LOGGER, colorstr from ultralytics.utils.torch_utils import select_device def train_improved_yolo(): parser = argparse.ArgumentParser() parser.add_argument('--data', type=str, default='VisDrone.yaml', help='dataset path') parser.add_argument('--weights', type=str, default='yolov8n.pt', help='initial weights path') parser.add_argument('--epochs', type=int, default=300, help='total training epochs') parser.add_argument('--batch-size', type=int, default=16, help='total batch size') parser.add_argument('--imgsz', type=int, default=640, help='image size') parser.add_argument('--device', type=str, default='', help='cuda device') parser.add_argument('--project', type=str, default='runs/train', help='project name') parser.add_argument('--name', type=str, default='exp', help='experiment name') args = parser.parse_args() # 加载预训练模型 model = YOLO(args.weights) # 替换Neck为改进版本（需要修改模型配置文件） # 这里通过自定义yaml文件实现 model = YOLO('models/improved_yolov8.yaml') # 训练参数 results = model.train( data=args.data, epochs=args.epochs, batch=args.batch_size, imgsz=args.imgsz, device=args.device, project=args.project, name=args.name, # 数据增强参数 hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, # 优化器参数 optimizer='SGD', lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3, warmup_momentum=0.8, warmup_bias_lr=0.1, # 损失函数参数 box=7.5, cls=0.5, dfl=1.5, # 训练技巧 label_smoothing=0.0, nbs=64, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, plots=True, ) return results if __name__ == '__main__': train_improved_yolo()

4.5 验证脚本

创建val.py：

python

import argparse from ultralytics import YOLO def validate(): parser = argparse.ArgumentParser() parser.add_argument('--weights', type=str, required=True, help='model path') parser.add_argument('--data', type=str, default='VisDrone.yaml', help='dataset path') parser.add_argument('--batch-size', type=int, default=16, help='batch size') parser.add_argument('--imgsz', type=int, default=640, help='image size') parser.add_argument('--device', type=str, default='', help='cuda device') args = parser.parse_args() model = YOLO(args.weights) metrics = model.val( data=args.data, batch=args.batch_size, imgsz=args.imgsz, device=args.device, plots=True, save_json=True, save_hybrid=True, ) print(f"mAP50: {metrics.box.map50:.4f}") print(f"mAP50-95: {metrics.box.map:.4f}") if __name__ == '__main__': validate()

5. 实验设置与数据集

5.1 数据集介绍

5.1.1 VisDrone数据集

VisDrone是由天津大学机器学习和数据挖掘实验室收集的大规模航拍数据集，包含288个视频片段和10,209张静态图像，覆盖中国14个不同城市的不同场景。标注类别包括：

行人 (pedestrian)
人群 (people)
自行车 (bicycle)
汽车 (car)
货车 (van)
卡车 (truck)
三轮车 (tricycle)
遮阳三轮车 (awning-tricycle)
公交车 (bus)
摩托车 (motor)

数据集特点：

图像分辨率：2000x1500
目标尺度：大量小目标，部分目标仅10x10像素
场景多样性：城市、乡村、公路、广场等

数据集下载：

bash

# 官方下载地址 https://github.com/VisDrone/VisDrone-Dataset # 使用脚本下载 wget https://download.visdrone.org/VisDrone2019-DET-train.zip wget https://download.visdrone.org/VisDrone2019-DET-val.zip wget https://download.visdrone.org/VisDrone2019-DET-test-dev.zip

5.1.2 DIOR数据集

DIOR是一个用于光学遥感图像目标检测的大规模数据集，包含23,463张图像和192,472个实例，覆盖20个类别：

飞机 (airplane)
机场 (airport)
棒球场 (baseballfield)
篮球场 (basketballcourt)
桥梁 (bridge)
烟囱 (chimney)
水坝 (dam)
高速公路服务区 (Expressway-toll-station)
高速公路收费站 (Expressway-toll-station)
足球场 (footballfield)
立交桥 (overpass)
港口 (port)
加油站 (stadium)
储罐 (storagetank)
网球场 (tenniscourt)
火车站 (trainstation)
车辆 (vehicle)
风车 (windmill)

数据集特点：

图像分辨率：800x800
场景变化：大尺度变化，不同季节、天气条件
目标多样性：包含人造和自然目标

5.2 数据集预处理

将数据集转换为YOLO格式：

python

# prepare_dataset.py import os import cv2 import numpy as np from tqdm import tqdm def convert_visdrone_to_yolo(visdrone_path, output_path): """将VisDrone标注转换为YOLO格式""" img_dir = os.path.join(visdrone_path, 'images') ann_dir = os.path.join(visdrone_path, 'annotations') os.makedirs(os.path.join(output_path, 'images'), exist_ok=True) os.makedirs(os.path.join(output_path, 'labels'), exist_ok=True) # 类别映射（VisDrone原始类别） # 0: ignored, 1: pedestrian, 2: people, 3: bicycle, 4: car # 5: van, 6: truck, 7: tricycle, 8: awning-tricycle, 9: bus, 10: motor # 11: others valid_classes = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] # 忽略0和11 class_map = {cls: idx for idx, cls in enumerate(valid_classes)} ann_files = [f for f in os.listdir(ann_dir) if f.endswith('.txt')] for ann_file in tqdm(ann_files, desc='Converting'): img_file = ann_file.replace('.txt', '.jpg') img_path = os.path.join(img_dir, img_file) if not os.path.exists(img_path): continue # 读取图像获取尺寸 img = cv2.imread(img_path) if img is None: continue h, w = img.shape[:2] # 读取标注 with open(os.path.join(ann_dir, ann_file), 'r') as f: lines = f.readlines() yolo_labels = [] for line in lines: parts = line.strip().split(',') if len(parts) < 8: continue obj_class = int(parts[5]) if obj_class not in valid_classes: continue # 边界框坐标 [x, y, w, h] bbox = [int(parts[0]), int(parts[1]), int(parts[2]), int(parts[3])] # 转换为YOLO格式 [class_id, x_center, y_center, width, height] (归一化) x_center = (bbox[0] + bbox[2] / 2) / w y_center = (bbox[1] + bbox[3] / 2) / h width = bbox[2] / w height = bbox[3] / h # 确保坐标在[0,1]范围内 x_center = max(0, min(1, x_center)) y_center = max(0, min(1, y_center)) width = max(0, min(1, width)) height = max(0, min(1, height)) yolo_labels.append(f"{class_map[obj_class]} {x_center:.6f} {y_center:.6f} {width:.6f} {height:.6f}") # 保存图像和标注 cv2.imwrite(os.path.join(output_path, 'images', img_file), img) with open(os.path.join(output_path, 'labels', ann_file), 'w') as f: f.write('\n'.join(yolo_labels)) if __name__ == '__main__': convert_visdrone_to_yolo('VisDrone2019-DET-train', 'VisDrone_YOLO/train') convert_visdrone_to_yolo('VisDrone2019-DET-val', 'VisDrone_YOLO/val')

5.3 数据集配置文件

创建VisDrone.yaml：

yaml

# VisDrone数据集配置文件 path: ./VisDrone_YOLO # 数据集根目录 train: images/train # 训练图像路径 val: images/val # 验证图像路径 test: images/test # 测试图像路径 # 类别数 nc: 10 # 类别名称 names: 0: pedestrian 1: people 2: bicycle 3: car 4: van 5: truck 6: tricycle 7: awning-tricycle 8: bus 9: motor

创建DIOR.yaml：

yaml

# DIOR数据集配置文件 path: ./DIOR_YOLO train: images/train val: images/val test: images/test nc: 20 names: 0: airplane 1: airport 2: baseballfield 3: basketballcourt 4: bridge 5: chimney 6: dam 7: Expressway-Service-area 8: Expressway-toll-station 9: golffield 10: groundtrackfield 11: harbor 12: overpass 13: ship 14: stadium 15: storagetank 16: tenniscourt 17: trainstation 18: vehicle 19: windmill

查看全文

http://www.jsqmd.com/news/455447/