当前位置：首页 > news >正文

从Faster R-CNN到Oriented R-CNN：在DOTA数据集上实战旋转目标检测（附完整训练配置）

news 2026/5/29 6:07:43

从Faster R-CNN到Oriented R-CNN：DOTA数据集旋转目标检测全流程实战

遥感图像中的舰船、建筑物或自动驾驶场景中的倾斜车辆，这些目标往往不是规整的水平矩形框能完整框住的。传统目标检测方法在处理这类目标时，要么会引入大量背景噪声，要么无法准确描述目标的实际朝向和形状。这就是旋转目标检测技术要解决的核心问题。

1. 旋转目标检测基础与环境搭建

旋转目标检测与常规目标检测最大的区别在于边界框的表示方式。水平检测框通常用(x,y,w,h)表示中心点坐标和宽高，而旋转框则需要引入角度参数。常见的旋转框表示方法有：

五点表示法：(x1,y1,x2,y2,x3,y3,x4,y4)表示四个角点坐标
旋转矩形表示：(x,y,w,h,θ)其中θ表示旋转角度
中点偏移表示：(x,y,w,h,Δα,Δβ)通过偏移量描述旋转特性

在DOTA数据集中，标注采用四点表示法，这对模型训练提出了特殊要求。我们选择MMRotate作为基础框架，它是OpenMMLab系列中专门针对旋转目标检测的工具包。

环境安装步骤：

conda create -n mmrotate python=3.8 -y conda activate mmrotate pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html pip install mmcv-full==1.4.5 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html git clone https://github.com/open-mmlab/mmrotate.git cd mmrotate pip install -r requirements/build.txt pip install -v -e .

注意：CUDA版本需要与PyTorch版本匹配，否则会导致训练时出现难以排查的错误。

2. DOTA数据集处理与特殊配置

DOTA数据集是当前最大的航空图像旋转目标检测基准，包含2,806张图像和188,282个实例，涵盖15个类别。其特殊之处在于：

图像尺寸极大（约4000×4000像素）
目标方向任意且密集排列
标注采用四点坐标表示法

数据处理关键步骤：

使用官方工具将原始图像切分为600×600的子图
转换标注格式为MMRotate支持的格式
处理类别不平衡问题（如"港口"类比"车辆"少得多）

# 标注格式转换示例 def dotav2_to_mmrotate(ann_file, out_file): data_infos = [] with open(ann_file) as f: data = json.load(f) for img_info in data['images']: filename = img_info['file_name'] width = img_info['width'] height = img_info['height'] anns = [obj for obj in data['annotations'] if obj['image_id'] == img_info['id']] bboxes = [] labels = [] for ann in anns: bbox = ann['bbox'] # [x1,y1,x2,y2,x3,y3,x4,y4] label = ann['category_id'] bboxes.append(bbox) labels.append(label) data_infos.append({ 'filename': filename, 'width': width, 'height': height, 'ann': { 'bboxes': np.array(bboxes, dtype=np.float32), 'labels': np.array(labels, dtype=np.int64) } }) mmcv.dump(data_infos, out_file)

3. Oriented R-CNN核心架构解析

Oriented R-CNN在Faster R-CNN基础上进行了三处关键改进：

Oriented RPN：生成带方向的候选框
Rotated RoI Align：旋转区域特征对齐
中点偏移表示法：更稳定的旋转框回归

3.1 Oriented RPN设计细节

传统RPN输出的是(x,y,w,h)四维回归量，Oriented RPN则扩展为六维(x,y,w,h,Δα,Δβ)。这种设计避免了直接回归角度带来的边界不连续问题。

Anchor设置对比：

参数	传统RPN	Oriented RPN
Anchor类型	水平矩形	水平矩形
回归维度	4	6
角度处理	无	中点偏移
计算复杂度	低	中等

3.2 Rotated RoI Align实现

这是模型中最关键也最容易出错的模块。其核心思想是：

根据预测的Δα和Δβ计算旋转矩阵
对每个RoI区域进行旋转变换
在旋转后的坐标系中进行双线性插值

# Rotated RoI Align核心代码逻辑 def rotated_roi_align(features, rois, output_size): theta = calculate_rotation(rois) # 从Δα,Δβ计算旋转角度 rotated_rois = apply_rotation(rois, theta) grid = generate_grid_points(rotated_rois, output_size) sampled_features = bilinear_sample(features, grid) return sampled_features

提示：实际实现时要特别注意处理旋转后的边界情况，避免特征图越界访问。

4. 完整训练配置与调优策略

以下是一个经过验证有效的训练配置方案：

基础配置：

# oriented_rcnn_r50_fpn_1x_dota_le90.py model = dict( type='OrientedRCNN', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch'), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( type='OrientedRPNHead', in_channels=256, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), bbox_coder=dict( type='MidpointOffsetCoder'), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict( type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)), roi_head=dict( type='OrientedStandardRoIHead', bbox_roi_extractor=dict( type='RotatedSingleRoIExtractor', roi_layer=dict( type='RoIAlignRotated', output_size=7, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32]), bbox_head=dict( type='RotatedShared2FCBBoxHead', in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=15, bbox_coder=dict( type='DeltaXYWHABBoxCoder', target_means=[0., 0., 0., 0., 0., 0.], target_stds=[0.1, 0.1, 0.2, 0.2, 0.1, 0.1]), reg_class_agnostic=True, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))))

调优经验：

学习率设置：初始lr=0.005，每15个epoch下降10倍
数据增强：随机旋转(-30°,30°)范围效果最佳
正负样本比例：保持1:3可缓解类别不平衡
多尺度训练：短边随机选择[400,600,800]像素

5. 模型评估与结果可视化

DOTA数据集采用mAP(mean Average Precision)作为主要评估指标，考虑不同IoU阈值下的检测精度。

典型性能指标：

方法	mAP@0.5	参数量(M)	推理速度(FPS)
Faster R-CNN	58.2	41.5	12.3
RoI Trans.	69.8	45.2	9.7
Oriented R-CNN	75.6	43.1	11.2

可视化分析时，重点关注以下场景的检测效果：

密集排列的舰船
不同朝向的车辆
不规则形状的建筑群

# 结果可视化代码示例 def show_results(img, bboxes, labels, class_names, score_thr=0.5): plt.imshow(img) ax = plt.gca() for bbox, label in zip(bboxes, labels): if bbox[8] < score_thr: continue poly = bbox[:8].reshape(4, 2) ax.add_patch(plt.Polygon( poly, fill=False, edgecolor='red', linewidth=2)) text = f'{class_names[label]} {bbox[8]:.2f}' ax.text(poly[0, 0], poly[0, 1], text, bbox=dict(facecolor='yellow', alpha=0.5)) plt.show()

在实际项目中，我们发现模型对小角度旋转(±15°)的目标检测效果最好，而对接近45°的目标容易出现偏差。这主要是因为数据集中这类样本较少，可以通过针对性增加大角度样本的数据增强来改善。

查看全文

http://www.jsqmd.com/news/907915/