当前位置：首页 > news >正文

手把手教你将DOTA遥感数据集转成COCO格式（附完整Python代码与可视化对比）

news 2026/7/28 10:52:22

手把手教你将DOTA遥感数据集转成COCO格式（附完整Python代码与可视化对比）

在计算机视觉领域，数据格式的统一往往是项目落地的第一道门槛。当我们从GitHub上找到一个惊艳的目标检测模型，准备用自己的数据集进行训练时，却发现框架只支持COCO格式——而手头的遥感图像标注却是DOTA格式。这种场景对于从事遥感图像分析的开发者来说再熟悉不过。

DOTA数据集作为遥感图像目标检测的标杆，采用旋转矩形框标注（Rotated Bounding Box），能精准框选任意角度的飞机、船舶等目标。但主流检测框架如MMDetection、Detectron2默认只支持COCO的轴对齐矩形框（Axis-Aligned Bounding Box）。本文将提供一套开箱即用的解决方案，包含：

完整Python转换脚本：处理DOTA的txt标注到COCO格式json文件
旋转框转正框的数学原理：解析坐标转换的核心算法
可视化对比工具：直观验证转换效果
工程化实践技巧：解决实际部署中的路径配置、类别映射等痛点

1. 环境准备与数据理解

1.1 安装必要依赖

转换脚本需要以下Python包，建议使用conda创建虚拟环境：

pip install pillow numpy matplotlib opencv-python

1.2 DOTA数据集结构解析

典型DOTA数据集目录结构如下：

DOTA/ ├── images/ │ ├── P0001.png │ └── P0002.png └── labelTxt/ ├── P0001.txt └── P0002.txt

单个txt标注文件内容示例：

588.0 438.0 599.0 426.0 618.0 437.0 607.0 449.0 small-vehicle 0 322.0 544.0 332.0 535.0 347.0 545.0 337.0 554.0 large-vehicle 1

每行代表一个旋转框，前8个数字是四边形顶点坐标(x1,y1,x2,y2,x3,y3,x4,y4)，接着是类别名称，最后0/1表示是否困难样本。

1.3 COCO格式核心字段

转换目标COCO格式的annotation需包含：

{ "images": [{ "file_name": "P0001.png", "height": 1024, "width": 1024, "id": 0 }], "annotations": [{ "segmentation": [[x1,y1,x2,y1,x2,y2,x1,y2]], "bbox": [x,y,width,height], # 左上角坐标+宽高 "area": float, "category_id": int, "iscrowd": 0, "image_id": 0, "id": 1000 }], "categories": [{ "id": 1, "name": "small-vehicle" }] }

2. 核心转换算法实现

2.1 旋转框转正框的数学原理

DOTA的旋转框本质是凸四边形，转换为COCO正框需要计算其最小外接矩形：

提取四边形四个顶点的x、y坐标
分别找出x坐标的最小/最大值（x_min, x_max）
找出y坐标的最小/最大值（y_min, y_max)
用(x_min, y_min)作为左上角，(x_max, y_max)作为右下角

关键代码实现：

def rotated_to_horizontal(points): """将旋转框转为水平矩形框 Args: points: list of 8 floats [x1,y1,x2,y2,x3,y3,x4,y4] Returns: (x1, y1, x2, y2) 正框对角坐标 """ x_coords = points[::2] # 所有x坐标 [x1,x2,x3,x4] y_coords = points[1::2] # 所有y坐标 [y1,y2,y3,y4] return ( min(x_coords), min(y_coords), max(x_coords), max(y_coords) )

2.2 完整转换流程代码

import os.path as osp import json from PIL import Image def create_coco_template(): """创建COCO格式空模板""" return { "images": [], "annotations": [], "categories": [], "type": "instance" } def parse_dota_line(line): """解析单行DOTA标注""" parts = line.strip().split() points = list(map(float, parts[:8])) class_name = parts[8] iscrowd = int(parts[9]) if len(parts) > 9 else 0 return points, class_name, iscrowd def convert_dataset(dota_root, output_json): """主转换函数""" # 初始化COCO数据结构 coco_data = create_coco_template() # 示例类别映射（需根据实际数据调整） CLASSES = ('small-vehicle', 'large-vehicle', 'ship') coco_data['categories'] = [ {'id': i+1, 'name': name} for i, name in enumerate(CLASSES) ] class_to_id = {name: i+1 for i, name in enumerate(CLASSES)} # 遍历所有标注文件 image_id, anno_id = 0, 0 label_dir = osp.join(dota_root, 'labelTxt') image_dir = osp.join(dota_root, 'images') for txt_name in os.listdir(label_dir): if not txt_name.endswith('.txt'): continue # 处理图像信息 image_name = txt_name.replace('.txt', '.png') image_path = osp.join(image_dir, image_name) if not osp.exists(image_path): continue with Image.open(image_path) as img: width, height = img.size coco_data['images'].append({ 'file_name': image_name, 'height': height, 'width': width, 'id': image_id }) # 处理标注信息 txt_path = osp.join(label_dir, txt_name) with open(txt_path, 'r') as f: for line in f: if line.startswith('#'): continue points, class_name, iscrowd = parse_dota_line(line) x1, y1, x2, y2 = rotated_to_horizontal(points) coco_data['annotations'].append({ 'segmentation': [[x1,y1,x2,y1,x2,y2,x1,y2]], 'bbox': [x1, y1, x2-x1, y2-y1], 'area': (x2-x1) * (y2-y1), 'category_id': class_to_id[class_name], 'iscrowd': iscrowd, 'image_id': image_id, 'id': anno_id }) anno_id += 1 image_id += 1 # 保存结果 with open(output_json, 'w') as f: json.dump(coco_data, f, indent=2)

3. 可视化验证与效果对比

3.1 使用OpenCV进行可视化

安装可视化依赖：

pip install opencv-python

可视化脚本示例：

import cv2 import random def plot_compare(img_path, dota_txt, coco_annos): """对比显示原始旋转框与转换后的正框""" img = cv2.imread(img_path) img_h, img_w = img.shape[:2] # 绘制DOTA旋转框（红色） with open(dota_txt, 'r') as f: for line in f: if line.startswith('#'): continue points = list(map(float, line.split()[:8])) pts = np.array(points, np.int32).reshape(-1, 2) cv2.polylines(img, [pts], isClosed=True, color=(0,0,255), thickness=2) # 绘制COCO正框（绿色） for anno in coco_annos: x,y,w,h = anno['bbox'] cv2.rectangle(img, (int(x),int(y)), (int(x+w),int(y+h)), (0,255,0), 2) # 调整显示大小 scale = 800 / max(img_w, img_h) img = cv2.resize(img, None, fx=scale, fy=scale) cv2.imshow('Comparison', img) cv2.waitKey(0)

3.2 典型转换效果分析

通过可视化对比可以发现：

正框的优缺点：
- ✅ 计算简单，兼容所有检测框架
- ❌ 对于旋转目标会包含过多背景（如图中船舶的倾斜角度较大时）
旋转框的优缺点：
- ✅ 标注更精准，尤其适合遥感倾斜目标
- ❌ 需要特殊检测头支持（如RRPN）

实际项目中，如果检测框架支持旋转框（如MMRotate），建议保留原始DOTA格式。本文方案主要解决框架兼容性问题。

4. 工程实践中的常见问题

4.1 路径配置陷阱

问题现象：

脚本运行时提示"File not found"
生成的JSON中图片路径不正确

解决方案：

使用osp.abspath处理绝对路径
保持相对路径的一致性

# 推荐路径处理方式 base_dir = '/path/to/DOTA' image_dir = osp.abspath(osp.join(base_dir, 'images'))

4.2 类别映射策略

当DOTA类别与项目需求不一致时：

合并相似类别：

CLASS_MAPPING = { 'small-vehicle': 'vehicle', 'large-vehicle': 'vehicle', 'van': 'vehicle' }

过滤不需要的类别：
```
IGNORE_CLASSES = {'harbor', 'bridge'}
```

4.3 大图像处理技巧

DOTA原始图像尺寸较大（约4000×4000），建议：

使用滑动窗口切割后再转换
调整检测框架的img_scale参数

# 示例切割代码 def split_image(img, tile_size=1024): height, width = img.shape[:2] tiles = [] for y in range(0, height, tile_size): for x in range(0, width, tile_size): tile = img[y:y+tile_size, x:x+tile_size] tiles.append(tile) return tiles

5. 性能优化与批量处理

5.1 多进程加速

对于大规模数据集，使用multiprocessing加速：

from multiprocessing import Pool def process_single(args): img_path, txt_path = args # 单张图片处理逻辑 ... if __name__ == '__main__': file_pairs = [...] # 所有(图片,标注)对 with Pool(8) as p: # 8进程并行 p.map(process_single, file_pairs)

5.2 增量更新机制

当数据集追加新样本时，避免全量重新生成：

def update_coco_json(existing_json, new_annotations): """增量更新COCO格式文件""" with open(existing_json) as f: data = json.load(f) last_img_id = max(img['id'] for img in data['images']) last_anno_id = max(anno['id'] for anno in data['annotations']) for anno in new_annotations: last_anno_id += 1 anno['id'] = last_anno_id data['annotations'].append(anno) with open(existing_json, 'w') as f: json.dump(data, f)

5.3 校验与修复工具

开发辅助工具检查数据质量：

def validate_coco_file(json_path): """验证COCO格式完整性""" with open(json_path) as f: data = json.load(f) errors = [] # 检查图像是否存在 for img in data['images']: if not osp.exists(img['file_name']): errors.append(f"Missing image: {img['file_name']}") # 检查标注是否越界 for anno in data['annotations']: img = next(i for i in data['images'] if i['id']==anno['image_id']) x,y,w,h = anno['bbox'] if x+w > img['width'] or y+h > img['height']: errors.append(f"Invalid bbox in image {img['file_name']}") return errors

查看全文

http://www.jsqmd.com/news/919846/