当前位置：首页 > news >正文

手把手教你将DOTA遥感数据集标注转成COCO格式（附完整Python代码）

news 2026/7/19 10:28:31

从DOTA到COCO：遥感图像标注格式转换实战指南

在计算机视觉领域，数据标注格式的统一性直接影响着模型训练的效率与效果。DOTA数据集作为遥感图像分析的重要基准，其独特的任意四边形标注方式能够精确捕捉物体的实际轮廓，而COCO格式作为通用目标检测框架的标准输入，则采用简洁的矩形框标注。本文将深入探讨两种格式的核心差异，并提供一套完整的Python解决方案，帮助开发者实现高效转换。

1. 理解DOTA与COCO标注的本质差异

DOTA数据集专为航空图像分析设计，其标注文件采用TXT格式存储，每个物体由四个顶点坐标(x1,y1,x2,y2,x3,y3,x4,y4)定义，这种表示方法能够准确描述任意朝向的物体轮廓。例如，一架斜向停放的飞机或一艘转向中的轮船，都能通过四边形顶点精确标注。

相比之下，COCO格式使用JSON文件组织数据，其标注核心是矩形框的左上角坐标和宽高(x,y,width,height)。这种Axis-Aligned Bounding Box(AABB)表示法虽然丢失了物体朝向信息，但简化了检测任务的计算复杂度，与大多数检测框架如MMDetection、Detectron2等天然兼容。

关键差异对比：

特征	DOTA格式	COCO格式
几何表示	任意四边形	正矩形
坐标维度	8个值(4个点)	4个值(x,y,w,h)
文件结构	每图对应一个TXT文件	整个数据集一个JSON文件
类别定义	包含在每行标注末尾	集中定义在categories中
适用场景	高精度方向敏感任务	通用目标检测

2. 转换流程的核心逻辑与实现

格式转换的核心是将任意四边形转换为最小外接正矩形。这个过程需要计算所有x坐标的最小/最大值和y坐标的最小/最大值，形成能够完全包含原始四边形的矩形区域。

def convert_dota_to_coco(dota_coords): """ 将DOTA四边形坐标转换为COCO矩形框 参数: dota_coords - 包含8个数字的列表[x1,y1,x2,y2,x3,y3,x4,y4] 返回: (x_min, y_min, x_max, y_max)形式的矩形坐标 """ x_coords = dota_coords[0::2] # 提取所有x坐标 y_coords = dota_coords[1::2] # 提取所有y坐标 x_min = min(x_coords) x_max = max(x_coords) y_min = min(y_coords) y_max = max(y_coords) return (x_min, y_min, x_max, y_max)

实际工程中还需要考虑以下关键点：

坐标边界处理：确保转换后的矩形不超出图像边界
类别映射：建立DOTA类别到COCO category_id的对应关系
标注ID生成：为每个标注对象分配唯一ID
图像信息整合：收集图像的宽高等元数据

注意：DOTA标注中的最后两个值分别是类别名称和difficult标志，需要正确解析。difficult标志通常映射为COCO中的iscrowd字段。

3. 完整工程实现方案

下面提供一个完整的Python脚本，实现从DOTA格式到COCO格式的批量转换：

import os import json from PIL import Image from typing import List, Dict, Tuple class DOTA2COCOConverter: def __init__(self, class_names: List[str]): """ 初始化转换器 :param class_names: 有序的类别名称列表 """ self.categories = [] self.class_name2id = {} # 构建COCO格式的categories和名称到ID的映射 for idx, name in enumerate(class_names, start=1): self.categories.append({ "id": idx, "name": name, "supercategory": name }) self.class_name2id[name] = idx def parse_dota_line(self, line: str) -> Tuple[List[float], str, int]: """ 解析单行DOTA标注 :param line: 标注行，如 "x1 y1 x2 y2 x3 y3 x4 y4 class_name difficult" :return: (坐标列表, 类别名称, iscrowd) """ parts = line.strip().split() coords = list(map(float, parts[:8])) class_name = parts[8] iscrowd = int(parts[9]) if len(parts) > 9 else 0 return coords, class_name, iscrowd def convert_annotation(self, coords: List[float], class_name: str, iscrowd: int, annotation_id: int, image_id: int) -> Dict: """ 转换单个标注为COCO格式 """ x_min, y_min, x_max, y_max = self.convert_dota_to_coco(coords) return { "segmentation": [[x_min, y_min, x_max, y_min, x_max, y_max, x_min, y_max]], "bbox": [x_min, y_min, x_max - x_min, y_max - y_min], "area": (x_max - x_min) * (y_max - y_min), "category_id": self.class_name2id[class_name], "iscrowd": iscrowd, "image_id": image_id, "id": annotation_id } def process_image(self, image_path: str, label_path: str, image_id: int, annotation_id: int) -> Tuple[List[Dict], int]: """ 处理单张图片及其标注 """ annotations = [] # 获取图像尺寸 with Image.open(image_path) as img: width, height = img.size # 解析DOTA标注文件 with open(label_path, 'r') as f: for line in f: if line.strip() == '': continue coords, class_name, iscrowd = self.parse_dota_line(line) annotation = self.convert_annotation( coords, class_name, iscrowd, annotation_id, image_id ) annotations.append(annotation) annotation_id += 1 image_info = { "file_name": os.path.basename(image_path), "height": height, "width": width, "id": image_id } return image_info, annotations, annotation_id def convert_dataset(self, image_dir: str, label_dir: str, output_path: str): """ 转换整个数据集 """ coco_data = { "images": [], "annotations": [], "categories": self.categories } annotation_id = 1 image_id = 1 # 遍历标注目录 for label_file in os.listdir(label_dir): if not label_file.endswith('.txt'): continue base_name = os.path.splitext(label_file)[0] image_file = base_name + '.png' # 假设图像为PNG格式 image_path = os.path.join(image_dir, image_file) label_path = os.path.join(label_dir, label_file) if not os.path.exists(image_path): continue image_info, annotations, annotation_id = self.process_image( image_path, label_path, image_id, annotation_id ) coco_data["images"].append(image_info) coco_data["annotations"].extend(annotations) image_id += 1 # 保存为JSON文件 with open(output_path, 'w') as f: json.dump(coco_data, f, indent=2) # 使用示例 if __name__ == "__main__": # DOTA数据集的15个默认类别 CLASSES = [ 'plane', 'baseball-diamond', 'bridge', 'ground-track-field', 'small-vehicle', 'large-vehicle', 'ship', 'tennis-court', 'basketball-court', 'storage-tank', 'soccer-ball-field', 'roundabout', 'harbor', 'swimming-pool', 'helicopter' ] converter = DOTA2COCOConverter(CLASSES) converter.convert_dataset( image_dir='path/to/images', label_dir='path/to/labels', output_path='output/annotations.json' )

4. 验证转换结果的正确性

完成格式转换后，必须验证结果的准确性。推荐采用以下验证方法：

可视化检查：将转换后的COCO标注绘制到原图上，确认边界框是否合理包含目标
统计验证：检查转换前后各类别的实例数量是否一致
面积比对：比较原始四边形与转换后矩形的面积比率，确保没有过度膨胀

import cv2 import numpy as np def visualize_coco_annotation(image_path, coco_annotations): """ 可视化COCO标注 """ image = cv2.imread(image_path) for ann in coco_annotations: x, y, w, h = ann['bbox'] cv2.rectangle(image, (int(x), int(y)), (int(x + w), int(y + h)), (0, 255, 0), 2) # 添加类别标签 class_name = next( (cat['name'] for cat in categories if cat['id'] == ann['category_id']), str(ann['category_id']) ) cv2.putText(image, class_name, (int(x), int(y - 5)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1) cv2.imshow('Visualization', image) cv2.waitKey(0) cv2.destroyAllWindows()

常见问题排查清单：

坐标越界：检查转换后的矩形坐标是否超出图像尺寸
类别丢失：确认所有DOTA类别都有对应的COCO category_id
ID冲突：确保每个annotation和image都有唯一ID
文件路径：验证图像路径在JSON中是否正确相对路径

5. 高级技巧与性能优化

对于大规模数据集处理，可以考虑以下优化策略：

并行处理：使用多进程加速图像处理和标注转换
增量更新：支持向现有COCO标注文件追加新数据
内存优化：对于极大图像，采用分块处理策略

from multiprocessing import Pool def parallel_convert(args): """ 并行处理的worker函数 """ image_path, label_path, image_id, annotation_id = args converter = DOTA2COCOConverter(CLASSES) return converter.process_image(image_path, label_path, image_id, annotation_id) # 在convert_dataset方法中使用 with Pool(processes=4) as pool: # 使用4个进程 results = pool.map(parallel_convert, task_args)

对于特殊场景的额外考虑：