DOTA1.5数据集处理实战:用Python脚本搞定大图切割与YOLO/VOC格式转换
DOTA1.5数据集处理实战:用Python脚本搞定大图切割与YOLO/VOC格式转换
遥感图像目标检测领域,DOTA1.5数据集因其高分辨率和大尺寸特性成为算法验证的重要基准。但直接将原始图像输入模型会遇到显存不足、小目标漏检等问题。本文将手把手教你用Python实现三大核心功能:1920×1920滑动窗口切割、YOLO格式转换和VOC格式生成,最终获得可直接训练的标准数据集。
1. 环境准备与数据理解
处理DOTA1.5数据集前,需要配置合适的开发环境并理解数据组织结构。推荐使用Python 3.8+和以下关键库:
pip install pillow opencv-python numpy matplotlib lxml数据集目录结构通常如下:
DOTA-v1.5/ ├── images/ │ ├── P0001.png │ └── ... └── labelTxt/ ├── P0001.txt └── ...原始标注文件(.txt)采用HBB(水平边界框)格式,每行表示一个物体,包含8个顶点坐标和类别标签。例如:
1468 1065 1664 1065 1664 1121 1468 1121 large-vehicle 0注意:DOTA1.5包含16个类别,从'plane'到'container-crane',需提前准备类别列表
2. 大图切割策略与实现
针对超过1920×1920像素的图像,采用滑动窗口切割法,关键参数:
- 窗口尺寸:1920×1920
- 重叠区域:250像素(防止目标被切割)
- 边缘处理:保证最后一块也能完整切割
核心代码逻辑:
from PIL import Image def sliding_window_cut(image_path, output_dir, window_size=1920, overlap=250): img = Image.open(image_path) width, height = img.size if width <= window_size and height <= window_size: return # 跳过不需切割的图像 step = window_size - overlap for i in range(0, width, step): for j in range(0, height, step): # 计算切割区域坐标 left = i upper = j right = min(i + window_size, width) lower = min(j + window_size, height) # 处理边缘情况 if right - left < window_size: left = max(0, right - window_size) if lower - upper < window_size: upper = max(0, lower - window_size) # 执行切割并保存 patch = img.crop((left, upper, right, lower)) patch.save(f"{output_dir}/{image_path.stem}_{i}_{j}.png")实际应用中还需处理标注同步转换。当图像被切割时,原始标注需要相应调整:
- 过滤完全不在当前窗口内的目标
- 调整保留目标的坐标(减去窗口偏移量)
- 处理被切割目标的边界(保持有效标注)
3. YOLO格式转换详解
YOLO格式使用归一化的中心坐标和宽高表示目标,转换公式为:
x_center = (x_min + x_max) / 2 / image_width y_center = (y_min + y_max) / 2 / image_height width = (x_max - x_min) / image_width height = (y_max - y_min) / image_height完整转换函数示例:
import numpy as np def dota_to_yolo(ann_file, output_file, class_list): with open(ann_file) as f: lines = [line.strip() for line in f.readlines()[2:]] objects = [] for line in lines: parts = line.split() points = np.array(parts[:8], dtype=float) cls = parts[8] # 计算HBB xmin, ymin = min(points[::2]), min(points[1::2]) xmax, ymax = max(points[::2]), max(points[1::2]) # 转换为YOLO格式 x_center = (xmin + xmax) / 2 y_center = (ymin + ymax) / 2 width = xmax - xmin height = ymax - ymin objects.append((cls, x_center, y_center, width, height)) # 写入YOLO格式文件 with open(output_file, 'w') as f: for obj in objects: cls_idx = class_list.index(obj[0]) f.write(f"{cls_idx} {obj[1]} {obj[2]} {obj[3]} {obj[4]}\n")提示:YOLO格式坐标是相对值(0-1之间),需确保在图像切割后重新计算
4. VOC格式生成技巧
PASCAL VOC格式采用XML文件存储标注信息,包含物体类别和绝对坐标。转换时需注意:
- 图像尺寸信息必须准确
- 每个object包含完整的边界框描述
- 可添加额外信息(如difficult、truncated等)
核心转换类结构:
from xml.dom.minidom import Document class YOLO2VOCConverter: def __init__(self, classes): self.classes = classes # 类别列表 def convert_single_file(self, yolo_file, image_file, output_xml): # 读取图像尺寸 img = cv2.imread(image_file) h, w = img.shape[:2] # 创建XML文档结构 doc = Document() annotation = doc.createElement("annotation") doc.appendChild(annotation) # 添加图像基本信息 self._add_basic_info(doc, annotation, image_file.name, w, h) # 处理每个YOLO标注 with open(yolo_file) as f: for line in f: cls_idx, xc, yc, bw, bh = map(float, line.split()) self._add_object(doc, annotation, int(cls_idx), xc, yc, bw, bh, w, h) # 保存XML文件 with open(output_xml, 'w') as f: doc.writexml(f, indent='', addindent=' ', newl='\n') def _add_basic_info(self, doc, parent, filename, width, height): # 实现添加folder/filename/size等基本信息 pass def _add_object(self, doc, parent, cls_idx, xc, yc, bw, bh, img_w, img_h): # 实现单个对象的添加逻辑 pass实际项目中,建议增加多进程处理以提高转换效率:
from multiprocessing import Pool def batch_convert(args): converter = YOLO2VOCConverter(classes) converter.convert_single_file(*args) if __name__ == '__main__': with Pool(processes=4) as pool: pool.map(batch_convert, file_pairs)5. 质量验证与可视化
完成格式转换后,必须验证标注的正确性。推荐两种验证方式:
方法一:边界框回显
import matplotlib.pyplot as plt def visualize_annotations(image_path, annotation_path, format='voc'): img = cv2.imread(image_path) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) if format == 'voc': # 解析XML标注 boxes = parse_voc_annotation(annotation_path) else: # yolo boxes = parse_yolo_annotation(annotation_path, img.shape[:2]) # 绘制边界框 for box in boxes: cv2.rectangle(img, (box[0], box[1]), (box[2], box[3]), (0,255,0), 2) plt.imshow(img) plt.show()方法二:统计验证
def validate_annotations(annotations_dir, image_dir): issues = [] for ann_file in Path(annotations_dir).glob('*.xml'): try: tree = ET.parse(ann_file) width = int(tree.find('.//size/width').text) height = int(tree.find('.//size/height').text) for obj in tree.findall('.//object'): xmin = int(obj.find('bndbox/xmin').text) # 检查坐标是否合理 if not (0 <= xmin < width): issues.append(f"Invalid xmin in {ann_file}") except Exception as e: issues.append(f"Error in {ann_file}: {str(e)}") return issues对于DOTA1.5这样的遥感数据集,特别要注意:
- 小目标是否被正确保留
- 切割边缘的目标是否完整
- 类别标签是否对应正确
6. 性能优化与实用技巧
处理大规模数据集时,效率至��重要。以下是几个优化建议:
并行处理:
from concurrent.futures import ThreadPoolExecutor with ThreadPoolExecutor(max_workers=8) as executor: executor.map(process_single_image, image_paths)内存优化:
- 使用生成器逐步处理文件
- 及时释放不再需要的内存
增量处理:
processed = set() if os.path.exists('processed.log'): with open('processed.log') as f: processed.update(f.read().splitlines()) for img_path in new_images: if img_path.name in processed: continue # 处理图像... with open('processed.log', 'a') as f: f.write(f"{img_path.name}\n")错误处理增强:
def safe_process(image_path): try: process_image(image_path) return True except Exception as e: logging.error(f"Failed {image_path}: {str(e)}") return False进度显示:
from tqdm import tqdm for img_path in tqdm(image_paths, desc="Processing"): process_image(img_path)
对于特别大的图像(如超过10000×10000像素),建议:
- 使用分块读取处理
- 考虑使用专门的图像处理库(如OpenCV的窗口读写)
- 增加重叠区域防止重要目标被切割
7. 实际项目集成建议
将本方案集成到机器学习项目时,推荐以下目录结构:
project/ ├── datasets/ │ ├── DOTA1.5/ │ │ ├── images/ # 原始图像 │ │ ├── labels/ # 原始标注 │ │ ├── processed/ # 处理后的数据 │ │ │ ├── images_1920/ # 切割后的图像 │ │ │ ├── labels_yolo/ # YOLO格式标注 │ │ │ └── labels_voc/ # VOC格式标注 │ └── ... ├── scripts/ │ ├── data_processing.py # 本文的代码 │ └── ... └── ...在训练脚本中,可以通过简单配置支持不同格式:
if cfg.data_format == 'yolo': dataset = YOLODataset(cfg.data_path) elif cfg.data_format == 'voc': dataset = VOCDataset(cfg.data_path)对于持续集成场景,可以添加自动化验证步骤:
@pytest.mark.parametrize("image_path", TEST_IMAGES) def test_annotation_consistency(image_path): img = cv2.imread(image_path) h, w = img.shape[:2] # 验证对应的标注文件 ann_path = get_annotation_path(image_path) boxes = parse_annotations(ann_path) for box in boxes: assert 0 <= box[0] < w, "Invalid xmin" assert 0 <= box[1] < h, "Invalid ymin" # 更多验证...最后提醒,处理后的数据集应当进行备份,并记录处理参数(如切割尺寸、重叠像素等),这对实验复现至关重要。
