当前位置：首页 > news >正文

DOTA1.5数据集处理实战：用Python脚本搞定大图切割与YOLO/VOC格式转换

news 2026/7/15 11:06:29

DOTA1.5数据集处理实战：用Python脚本搞定大图切割与YOLO/VOC格式转换

遥感图像目标检测领域，DOTA1.5数据集因其高分辨率和大尺寸特性成为算法验证的重要基准。但直接将原始图像输入模型会遇到显存不足、小目标漏检等问题。本文将手把手教你用Python实现三大核心功能：1920×1920滑动窗口切割、YOLO格式转换和VOC格式生成，最终获得可直接训练的标准数据集。

1. 环境准备与数据理解

处理DOTA1.5数据集前，需要配置合适的开发环境并理解数据组织结构。推荐使用Python 3.8+和以下关键库：

pip install pillow opencv-python numpy matplotlib lxml

数据集目录结构通常如下：

DOTA-v1.5/ ├── images/ │ ├── P0001.png │ └── ... └── labelTxt/ ├── P0001.txt └── ...

原始标注文件（.txt）采用HBB（水平边界框）格式，每行表示一个物体，包含8个顶点坐标和类别标签。例如：

1468 1065 1664 1065 1664 1121 1468 1121 large-vehicle 0

注意：DOTA1.5包含16个类别，从'plane'到'container-crane'，需提前准备类别列表

2. 大图切割策略与实现

针对超过1920×1920像素的图像，采用滑动窗口切割法，关键参数：

窗口尺寸：1920×1920
重叠区域：250像素（防止目标被切割）
边缘处理：保证最后一块也能完整切割

核心代码逻辑：

from PIL import Image def sliding_window_cut(image_path, output_dir, window_size=1920, overlap=250): img = Image.open(image_path) width, height = img.size if width <= window_size and height <= window_size: return # 跳过不需切割的图像 step = window_size - overlap for i in range(0, width, step): for j in range(0, height, step): # 计算切割区域坐标 left = i upper = j right = min(i + window_size, width) lower = min(j + window_size, height) # 处理边缘情况 if right - left < window_size: left = max(0, right - window_size) if lower - upper < window_size: upper = max(0, lower - window_size) # 执行切割并保存 patch = img.crop((left, upper, right, lower)) patch.save(f"{output_dir}/{image_path.stem}_{i}_{j}.png")

实际应用中还需处理标注同步转换。当图像被切割时，原始标注需要相应调整：

过滤完全不在当前窗口内的目标
调整保留目标的坐标（减去窗口偏移量）
处理被切割目标的边界（保持有效标注）

3. YOLO格式转换详解

YOLO格式使用归一化的中心坐标和宽高表示目标，转换公式为：

x_center = (x_min + x_max) / 2 / image_width y_center = (y_min + y_max) / 2 / image_height width = (x_max - x_min) / image_width height = (y_max - y_min) / image_height

完整转换函数示例：

import numpy as np def dota_to_yolo(ann_file, output_file, class_list): with open(ann_file) as f: lines = [line.strip() for line in f.readlines()[2:]] objects = [] for line in lines: parts = line.split() points = np.array(parts[:8], dtype=float) cls = parts[8] # 计算HBB xmin, ymin = min(points[::2]), min(points[1::2]) xmax, ymax = max(points[::2]), max(points[1::2]) # 转换为YOLO格式 x_center = (xmin + xmax) / 2 y_center = (ymin + ymax) / 2 width = xmax - xmin height = ymax - ymin objects.append((cls, x_center, y_center, width, height)) # 写入YOLO格式文件 with open(output_file, 'w') as f: for obj in objects: cls_idx = class_list.index(obj[0]) f.write(f"{cls_idx} {obj[1]} {obj[2]} {obj[3]} {obj[4]}\n")

提示：YOLO格式坐标是相对值（0-1之间），需确保在图像切割后重新计算

4. VOC格式生成技巧

PASCAL VOC格式采用XML文件存储标注信息，包含物体类别和绝对坐标。转换时需注意：

图像尺寸信息必须准确
每个object包含完整的边界框描述
可添加额外信息（如difficult、truncated等）

核心转换类结构：

from xml.dom.minidom import Document class YOLO2VOCConverter: def __init__(self, classes): self.classes = classes # 类别列表 def convert_single_file(self, yolo_file, image_file, output_xml): # 读取图像尺寸 img = cv2.imread(image_file) h, w = img.shape[:2] # 创建XML文档结构 doc = Document() annotation = doc.createElement("annotation") doc.appendChild(annotation) # 添加图像基本信息 self._add_basic_info(doc, annotation, image_file.name, w, h) # 处理每个YOLO标注 with open(yolo_file) as f: for line in f: cls_idx, xc, yc, bw, bh = map(float, line.split()) self._add_object(doc, annotation, int(cls_idx), xc, yc, bw, bh, w, h) # 保存XML文件 with open(output_xml, 'w') as f: doc.writexml(f, indent='', addindent=' ', newl='\n') def _add_basic_info(self, doc, parent, filename, width, height): # 实现添加folder/filename/size等基本信息 pass def _add_object(self, doc, parent, cls_idx, xc, yc, bw, bh, img_w, img_h): # 实现单个对象的添加逻辑 pass

实际项目中，建议增加多进程处理以提高转换效率：

from multiprocessing import Pool def batch_convert(args): converter = YOLO2VOCConverter(classes) converter.convert_single_file(*args) if __name__ == '__main__': with Pool(processes=4) as pool: pool.map(batch_convert, file_pairs)

5. 质量验证与可视化

完成格式转换后，必须验证标注的正确性。推荐两种验证方式：

方法一：边界框回显

import matplotlib.pyplot as plt def visualize_annotations(image_path, annotation_path, format='voc'): img = cv2.imread(image_path) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) if format == 'voc': # 解析XML标注 boxes = parse_voc_annotation(annotation_path) else: # yolo boxes = parse_yolo_annotation(annotation_path, img.shape[:2]) # 绘制边界框 for box in boxes: cv2.rectangle(img, (box[0], box[1]), (box[2], box[3]), (0,255,0), 2) plt.imshow(img) plt.show()

方法二：统计验证

def validate_annotations(annotations_dir, image_dir): issues = [] for ann_file in Path(annotations_dir).glob('*.xml'): try: tree = ET.parse(ann_file) width = int(tree.find('.//size/width').text) height = int(tree.find('.//size/height').text) for obj in tree.findall('.//object'): xmin = int(obj.find('bndbox/xmin').text) # 检查坐标是否合理 if not (0 <= xmin < width): issues.append(f"Invalid xmin in {ann_file}") except Exception as e: issues.append(f"Error in {ann_file}: {str(e)}") return issues

对于DOTA1.5这样的遥感数据集，特别要注意：

小目标是否被正确保留
切割边缘的目标是否完整
类别标签是否对应正确

6. 性能优化与实用技巧

处理大规模数据集时，效率至��重要。以下是几个优化建议：

并行处理：

from concurrent.futures import ThreadPoolExecutor with ThreadPoolExecutor(max_workers=8) as executor: executor.map(process_single_image, image_paths)

内存优化：
- 使用生成器逐步处理文件
- 及时释放不再需要的内存

增量处理：

processed = set() if os.path.exists('processed.log'): with open('processed.log') as f: processed.update(f.read().splitlines()) for img_path in new_images: if img_path.name in processed: continue # 处理图像... with open('processed.log', 'a') as f: f.write(f"{img_path.name}\n")

错误处理增强：

def safe_process(image_path): try: process_image(image_path) return True except Exception as e: logging.error(f"Failed {image_path}: {str(e)}") return False

进度显示：

from tqdm import tqdm for img_path in tqdm(image_paths, desc="Processing"): process_image(img_path)

对于特别大的图像（如超过10000×10000像素），建议：

使用分块读取处理
考虑使用专门的图像处理库（如OpenCV的窗口读写）
增加重叠区域防止重要目标被切割

7. 实际项目集成建议

将本方案集成到机器学习项目时，推荐以下目录结构：

project/ ├── datasets/ │ ├── DOTA1.5/ │ │ ├── images/ # 原始图像 │ │ ├── labels/ # 原始标注 │ │ ├── processed/ # 处理后的数据 │ │ │ ├── images_1920/ # 切割后的图像 │ │ │ ├── labels_yolo/ # YOLO格式标注 │ │ │ └── labels_voc/ # VOC格式标注 │ └── ... ├── scripts/ │ ├── data_processing.py # 本文的代码 │ └── ... └── ...

在训练脚本中，可以通过简单配置支持不同格式：

if cfg.data_format == 'yolo': dataset = YOLODataset(cfg.data_path) elif cfg.data_format == 'voc': dataset = VOCDataset(cfg.data_path)

对于持续集成场景，可以添加自动化验证步骤：

@pytest.mark.parametrize("image_path", TEST_IMAGES) def test_annotation_consistency(image_path): img = cv2.imread(image_path) h, w = img.shape[:2] # 验证对应的标注文件 ann_path = get_annotation_path(image_path) boxes = parse_annotations(ann_path) for box in boxes: assert 0 <= box[0] < w, "Invalid xmin" assert 0 <= box[1] < h, "Invalid ymin" # 更多验证...

最后提醒，处理后的数据集应当进行备份，并记录处理参数（如切割尺寸、重叠像素等），这对实验复现至关重要。

查看全文

http://www.jsqmd.com/news/874620/