当前位置：首页 > news >正文

保姆级教程：用Python脚本将ICDAR2015文本定位数据集转成COCO格式（附完整代码）

news 2026/6/22 5:42:01

从ICDAR2015到COCO：文本定位数据集格式转换实战指南

在计算机视觉领域，文本检测任务一直是研究热点之一。ICDAR2015作为场景文本检测的经典基准数据集，其提供的标注格式与当前主流检测框架（如MMDetection、Detectron2等）常用的COCO格式存在显著差异。本文将手把手教你如何用Python脚本实现这两种格式的无缝转换，并提供可直接集成到生产环境的完整代码解决方案。

1. 理解数据集格式差异

1.1 ICDAR2015原始标注解析

ICDAR2015数据集采用简单的文本文件(.txt)存储标注信息，每行对应一个文本实例的标注，格式如下：

x1,y1,x2,y2,x3,y3,x4,y4,transcription

典型示例：

377,117,463,117,465,130,378,130,Genaxis Theatre 493,115,519,115,519,131,493,131,[06] 374,155,409,155,409,170,374,170,###

关键特征：

前8个数字表示四边形文本框的四个顶点坐标（顺时针或逆时针顺序）
最后一个字段是文本内容，###表示模糊或不可读文本（应忽略）
每个图像对应一个同名的.txt标注文件

1.2 COCO格式规范详解

COCO格式采用JSON结构组织数据，主要包含三个核心部分：

{ "images": [ { "file_name": "img_1.jpg", "height": 720, "width": 1280, "id": 0 } ], "categories": [ {"id": 1, "name": "text"} ], "annotations": [ { "iscrowd": 0, "category_id": 1, "bbox": [216, 222, 93, 37], "area": 2054.0, "segmentation": [[217,235,304,222,309,243,216,259]], "image_id": 0, "id": 0 } ] }

关键字段对比：

属性	ICDAR2015	COCO格式
几何表示	四边形顶点坐标	多边形segmentation+bbox
忽略标注	`###`文本内容	`iscrowd=1`字段
图像信息	单独文件存储	集成在JSON中的images数组
类别信息	隐含（均为文本）	显式categories定义

2. 转换流程设计与核心代码

2.1 整体转换流程

遍历数据集目录：收集所有图像和对应的标注文件路径
解析原始标注：读取.txt文件，提取四边形坐标和文本内容
几何转换：将四边形转换为COCO需要的多边形表示
字段映射：将ICDAR字段转换为COCO对应字段
JSON序列化：按照COCO格式组织数据并输出

2.2 关键代码实现

2.2.1 数据结构定义

首先定义转换过程中需要使用的数据结构：

from shapely.geometry import Polygon import numpy as np import mmcv import os.path as osp class ICDAR15ToCOCOConverter: def __init__(self, dataset_root, output_dir): self.dataset_root = dataset_root self.output_dir = output_dir self.categories = [{"id": 1, "name": "text"}]

2.2.2 标注解析核心方法

def parse_icdar_annotation(self, gt_file): """解析ICDAR2015标注文件""" with open(gt_file, 'r', encoding='utf-8-sig') as f: lines = f.readlines() annotations = [] for line in lines: parts = line.strip().split(',') if len(parts) < 8: continue # 提取坐标和文本内容 coords = list(map(int, parts[:8])) text = ','.join(parts[8:]) if len(parts) > 8 else '' # 创建多边形几何对象 polygon = Polygon(np.array(coords).reshape(-1, 2)) # 构建COCO格式标注 annotation = { "iscrowd": 1 if text == "###" else 0, "category_id": 1, "bbox": self._get_bbox_from_polygon(polygon), "area": polygon.area, "segmentation": [coords], "image_id": None, # 将在后续填充 "id": None # 将在后续填充 } annotations.append(annotation) return annotations def _get_bbox_from_polygon(self, polygon): """从多边形获取COCO格式的bbox[x,y,width,height]""" min_x, min_y, max_x, max_y = polygon.bounds return [min_x, min_y, max_x - min_x, max_y - min_y]

2.2.3 图像信息处理

def process_image(self, img_path, gt_path): """处理单张图像及其标注""" img = mmcv.imread(img_path) img_info = { "file_name": osp.relpath(img_path, self.dataset_root), "height": img.shape[0], "width": img.shape[1], "id": None # 将在后续填充 } annotations = self.parse_icdar_annotation(gt_path) return img_info, annotations

3. 完整转换脚本实现

3.1 脚本组织结构

建议按以下目录结构组织代码：

icdar15_to_coco/ ├── converter.py # 主转换脚本 ├── utils.py # 工具函数 ├── requirements.txt # 依赖项 └── configs/ # 配置文件目录

3.2 主转换脚本

import os import os.path as osp from tqdm import tqdm import json from concurrent.futures import ThreadPoolExecutor class ICDAR15ToCOCOConverter: # ... (之前定义的方法) def convert(self, splits=['training', 'test']): """执行完整转换流程""" coco_data = { "images": [], "annotations": [], "categories": self.categories } annotation_id = 0 for split in splits: img_dir = osp.join(self.dataset_root, 'imgs', split) gt_dir = osp.join(self.dataset_root, 'annotations', split) img_files = [f for f in os.listdir(img_dir) if f.endswith('.jpg')] with ThreadPoolExecutor(max_workers=8) as executor: futures = [] for img_file in img_files: img_path = osp.join(img_dir, img_file) gt_file = osp.join(gt_dir, f'gt_{osp.splitext(img_file)[0]}.txt') futures.append(executor.submit(self.process_image, img_path, gt_file)) for future in tqdm(futures, desc=f"Processing {split} set"): img_info, annotations = future.result() img_id = len(coco_data["images"]) img_info["id"] = img_id coco_data["images"].append(img_info) for ann in annotations: ann["id"] = annotation_id ann["image_id"] = img_id coco_data["annotations"].append(ann) annotation_id += 1 output_path = osp.join(self.output_dir, 'icdar2015_coco_format.json') with open(output_path, 'w') as f: json.dump(coco_data, f, indent=2) return output_path

3.3 使用示例

if __name__ == '__main__': converter = ICDAR15ToCOCOConverter( dataset_root='/path/to/icdar2015', output_dir='/path/to/output' ) output_json = converter.convert() print(f"转换完成，结果保存至: {output_json}")

4. 高级技巧与优化建议

4.1 性能优化策略

并行处理：使用ThreadPoolExecutor加速图像处理

with ThreadPoolExecutor(max_workers=8) as executor: results = list(tqdm(executor.map(process_func, file_pairs), total=len(file_pairs)))

内存优化：对于大型数据集，可分批次处理并增量写入文件
缓存机制：对已处理的图像添加校验机制，避免重复处理

4.2 质量检查方法

转换完成后，建议进行以下验证：

def validate_coco_json(json_path): """验证生成的COCO格式文件""" with open(json_path) as f: data = json.load(f) # 基础结构检查 assert all(k in data for k in ['images', 'annotations', 'categories']) # 图像与标注关联检查 image_ids = {img['id'] for img in data['images']} for ann in data['annotations']: assert ann['image_id'] in image_ids # 几何有效性检查 for ann in data['annotations']: assert len(ann['segmentation'][0]) >= 8 # 至少4个点 assert ann['area'] > 0 print("验证通过，COCO格式文件有效")

4.3 可视化对比

使用以下代码对比原始标注和转换后标注：

import cv2 import matplotlib.pyplot as plt def visualize_comparison(img_path, icdar_gt_path, coco_annotations): """可视化对比原始和转换后的标注""" img = cv2.cvtColor(cv2.imread(img_path), cv2.COLOR_BGR2RGB) # 绘制ICDAR原始标注 with open(icdar_gt_path) as f: for line in f: coords = list(map(int, line.strip().split(',')[:8])) pts = np.array(coords).reshape(-1, 2) cv2.polylines(img, [pts], isClosed=True, color=(255,0,0), thickness=2) # 绘制COCO标注 for ann in coco_annotations: seg = ann['segmentation'][0] pts = np.array(seg).reshape(-1, 2).astype(int) cv2.polylines(img, [pts], isClosed=True, color=(0,255,0), thickness=2) bbox = list(map(int, ann['bbox'])) cv2.rectangle(img, (bbox[0], bbox[1]), (bbox[0]+bbox[2], bbox[1]+bbox[3]), (0,0,255), 1) plt.figure(figsize=(12, 8)) plt.imshow(img) plt.title("Red: ICDAR | Green: COCO | Blue: COCO bbox") plt.axis('off') plt.show()

5. 实际应用集成

5.1 与MMDetection集成

转换后的数据集可直接用于MMDetection训练：

# configs/textdet/custom_config.py dataset_type = 'CocoDataset' data_root = 'data/icdar2015/' train = dict( type=dataset_type, ann_file=data_root + 'icdar2015_coco_format.json', img_prefix=data_root + 'imgs/training/', pipeline=train_pipeline )

5.2 常见问题解决

问题1：坐标越界错误

解决方案：在转换前添加边界检查

coords = [max(0, min(coord, img_width if i%2==0 else img_height)) for i, coord in enumerate(coords)]

问题2：无效多边形

解决方案：使用Shapely进行几何验证

from shapely.validation import make_valid polygon = Polygon(coords) if not polygon.is_valid: polygon = make_valid(polygon)

问题3：文本方向不一致

解决方案：统一顶点顺序

from scipy.spatial import ConvexHull def sort_vertices(pts): hull = ConvexHull(pts) return pts[hull.vertices]

查看全文

http://www.jsqmd.com/news/735346/

【小白不踩坑】OpenClaw 2.6.6 部署全流程（官方安装包直达）

Dify知识库增强工具：精细化文档预处理提升RAG应用效果

突破GPS依赖桎梏！2026最新无感定位技术，赋能室外复杂场景数字孪生全域升级

5分钟快速上手：用Scrapy框架高效采集拼多多商品数据

Android Native 库加载异常（UnsatisfiedLinkError）排查通用指南

盘古开天，世界新生：深度解读华为云CEO张平安总HDC 2025 Keynote盘古世界模型

ComfyUI-Impact-Pack 图像增强插件：5个核心技巧解锁专业级AI图像处理

Ultracite CSS框架：极简实用优先的现代Web开发利器

OpenClaw中文教学技能包：AI辅助课程标准化与安全发布实践

mysql8.4.9报ERROR 1524 (HY000) at line 1: Plugin ‘mysql_native_password‘ is not loaded的解决方法

Toradex OSM与Lino SoM模块：工业边缘计算的核心技术解析

微信聊天记录永久备份神器：WeChatExporter 3步搞定数据安全保护

OBS虚拟摄像头完全指南：如何在视频会议中使用OBS专业画面

PCL2整合包制作终极指南：从零开始创建完美Minecraft整合包

小白也能学会的 OpenClaw 本地 AI 部署全流程（包含新版安装包）

PowerToys 安装使用教程

智能体工作流编排：从DAG原理到Agent-Flow实战应用

3步解密QQ音乐加密文件：qmc-decoder音频转换终极方案

别再只盯着Transformer了！手把手教你用DA-TransUNet复现医学图像分割SOTA（附代码）

创业公司如何利用多模型聚合平台优化ai产品开发流程

7-Zip-zstd：重新定义压缩效率的工程实践

B站缓存视频合并工具：如何突破离线观看的碎片化限制？

ROS Noetic下，从源码编译MoveIt!到集成自定义OMPL规划器的保姆级避坑指南

Python运行时校验与静态类型检查的协同之道：Pydantic + mypy/pyright 实战边界划分指南

C语言完美演绎9-12

家庭理财收益到底怎么算？巴比伦家庭理财助手做了一次“看不见但很重要”的优化

AI智能体B2B销售线索挖掘：零代码自然语言驱动实战指南

Tidyverse 2.0自动化报告面试题库（含`quarto`, `flexdashboard`, `pandoc`链路考点）——大厂DS岗内部培训材料首次公开

C++ 单链表（带头结点）