当前位置：首页 > news >正文

你的第一个实例分割项目：从Labelme标注到用MMDetection训练（COCO格式实战）

news 2026/6/26 19:31:33

从零构建实例分割项目：Labelme标注与MMDetection训练全流程指南

当我们需要让计算机识别图像中的特定物体并精确勾勒其轮廓时，实例分割技术就派上了用场。不同于简单的物体检测，实例分割不仅能定位物体，还能精确描绘出物体的形状边界。这项技术在医疗影像分析、自动驾驶、工业质检等领域有着广泛应用。本文将带你完整走通一个实例分割项目的全流程：从数据标注、格式转换到模型训练，最终在MMDetection框架中验证你的标注数据。

1. 数据标注：用Labelme打造高质量数据集

任何机器学习项目都始于数据准备，实例分割尤其依赖精确的标注。Labelme作为一款开源图像标注工具，因其简单易用和强大的多边形标注功能，成为许多研究者的首选。

1.1 安装与配置Labelme环境

在开始标注前，我们需要配置好工作环境。推荐使用Python虚拟环境来管理依赖：

# 创建并激活虚拟环境 python -m venv labelme_env source labelme_env/bin/activate # Linux/Mac labelme_env\Scripts\activate # Windows # 安装Labelme pip install labelme pyqt5

提示：如果遇到PyQt5安装问题，可以尝试先安装系统依赖：sudo apt-get install python3-pyqt5（Ubuntu）或brew install pyqt（MacOS）

1.2 创建标注规范文件

在标注前，我们需要定义labels.txt文件来规范类别体系。这个文件不仅影响标注过程，也关系到后续模型训练的效果。

__ignore__ _background_ building vehicle tree

__ignore__和_background_是Labelme要求的固定字段，不要修改或删除
类别名称建议使用英文，避免编码问题
类别顺序决定了后续训练中类别ID的分配，一旦确定不要随意更改

1.3 高效标注技巧

启动标注界面：

labelme --labels=labels.txt --nodata

在实际标注中，这些技巧能显著提升效率和质量：

多边形标注：对于不规则物体，从边缘开始顺时针点击关键点
快捷键使用：
- Ctrl+Z撤销上一步操作
- Esc取消当前多边形绘制
- Enter完成当前多边形
复杂物体处理：
- 对于被遮挡物体，可分多个部分标注
- 使用group_id字段关联属于同一物体的不同部分

标注完成后，每张图片会生成对应的JSON文件，包含原始图像数据和标注信息。

2. 数据格式转换：从Labelme到COCO标准

大多数深度学习框架都支持COCO数据格式，我们需要将Labelme的JSON标注转换为COCO格式。

2.1 理解数据格式差异

Labelme和COCO格式的核心区别在于数据结构组织：

特性	Labelme格式	COCO格式
存储方式	每张图片一个JSON文件	整个数据集一个annotations.json
图像引用	直接包含图像数据或路径	通过file_name字段引用
标注结构	简单多边形点列表	统一segmentation字段
类别管理	每个文件独立	全局categories字段

2.2 执行格式转换

Labelme提供了转换脚本，通常位于labelme/examples/instance_segmentation/目录下：

python labelme2coco.py labeled_data/ output/ --labels labels.txt

转换完成后，输出目录结构如下：

output/ ├── annotations.json ├── JPEGImages/ │ ├── img1.jpg │ └── img2.jpg └── Visualization/ ├── img1.jpg └── img2.jpg

2.3 验证转换结果

转换后务必检查annotations.json的完整性。这个Python代码片段可以帮助你快速验证：

import json with open('output/annotations.json') as f: coco_data = json.load(f) print(f"图片数量: {len(coco_data['images'])}") print(f"标注数量: {len(coco_data['annotations'])}") print(f"类别信息: {coco_data['categories']}") # 检查标注与图片的对应关系 for ann in coco_data['annotations'][:5]: img_id = ann['image_id'] img_info = next(i for i in coco_data['images'] if i['id']==img_id) print(f"图片{img_info['file_name']}有{ann['id']}号标注")

常见问题处理：

坐标越界：检查segmentation点是否超出图像尺寸
类别不匹配：确认labels.txt与标注文件中的类别一致
空标注文件：删除或修正无有效标注的图片

3. 数据集划分与增强

3.1 科学划分训练集与验证集

使用scikit-learn可以轻松实现数据集划分：

from sklearn.model_selection import train_test_split import os import shutil json_files = [f for f in os.listdir('labeled_data') if f.endswith('.json')] train_files, val_files = train_test_split(json_files, test_size=0.2, random_state=42) for folder, files in [('train', train_files), ('val', val_files)]: os.makedirs(folder, exist_ok=True) for file in files: shutil.copy(f'labeled_data/{file}', f'{folder}/{file}') shutil.copy(f'labeled_data/{file[:-5]}.jpg', f'{folder}/{file[:-5]}.jpg')

3.2 数据增强策略

在MMDetection中，可以通过配置文件实现丰富的数据增强。以下是一个典型配置示例：

train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='RandomFlip', flip_ratio=0.5), dict(type='AutoAugment', policies=[ [dict(type='Resize', img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], multiscale_mode='value', keep_ratio=True)], [dict(type='RandomCrop', crop_type='absolute_range', crop_size=(384, 600), allow_negative_crop=True), dict(type='Resize', img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], multiscale_mode='value', override=True, keep_ratio=True)] ]), dict(type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']) ]

4. MMDetection训练配置与实战

4.1 准备MMDetection环境

安装MMDetection及其依赖：

pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html git clone https://github.com/open-mmlab/mmdetection.git cd mmdetection pip install -r requirements/build.txt pip install -v -e .

4.2 数据集配置文件

在mmdetection/configs/_base_/datasets/下创建coco_instance.py：

dataset_type = 'CocoDataset' data_root = 'data/custom/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']), ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']), ]) ] data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( type=dataset_type, ann_file=data_root + 'annotations/train.json', img_prefix=data_root + 'train/', pipeline=train_pipeline), val=dict( type=dataset_type, ann_file=data_root + 'annotations/val.json', img_prefix=data_root + 'val/', pipeline=test_pipeline), test=dict( type=dataset_type, ann_file=data_root + 'annotations/val.json', img_prefix=data_root + 'val/', pipeline=test_pipeline))

4.3 选择与配置模型

对于初学者，Mask R-CNN是一个不错的起点。修改配置文件configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py：

model = dict( type='MaskRCNN', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch'), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( type='RPNHead', in_channels=256, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[.0, .0, .0, .0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0)), roi_head=dict( type='StandardRoIHead', bbox_roi_transformer=dict( type='BBoxRoIAlign', output_size=7, roi_feat_size=7), mask_roi_transformer=dict( type='MaskRoIAlign', output_size=14, roi_feat_size=7), bbox_head=dict( type='Shared2FCBBoxHead', in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=3, # 修改为你的类别数+1（背景） bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0., 0., 0., 0.], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0)), mask_head=dict( type='FCNMaskHead', num_convs=4, in_channels=256, conv_out_channels=256, num_classes=3, # 修改为你的类别数+1（背景） loss_mask=dict( type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))))

4.4 启动训练与验证

使用以下命令开始训练：

python tools/train.py configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py \ --work-dir work_dirs/mask_rcnn_custom \ --cfg-options data.samples_per_gpu=2 data.workers_per_gpu=2

训练过程中可以使用TensorBoard监控指标：

tensorboard --logdir work_dirs/mask_rcnn_custom

训练完成后，用以下命令测试模型性能：

python tools/test.py configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py \ work_dirs/mask_rcnn_custom/latest.pth \ --eval bbox segm \ --show-dir results

5. 常见问题排查与性能优化

5.1 训练过程中的典型问题

Loss不下降：
- 检查学习率是否合适（默认0.02可能过大）
- 验证数据标注是否正确
- 尝试更简单的模型或减少类别数量
内存不足(OOM)：
- 减小data.samples_per_gpu
- 使用更小的输入图像尺寸
- 尝试梯度累积
评估指标异常：
- 确认num_classes设置正确
- 检查数据集中是否存在空标注
- 验证COCO格式转换是否正确

5.2 性能优化技巧

模型选择指南：

模型	速度	精度	显存占用	适用场景
Mask R-CNN	中等	高	高	通用场景
Cascade Mask R-CNN	慢	很高	很高	高精度要求
YOLACT	快	中等	中等	实时应用
SOLOv2	快	中等	低	密集物体

训练加速技巧：
- 使用混合精度训练：--fp16
- 启用cudnn benchmark：env CUDA_CACHE_PATH=/path/to/cache
- 预加载数据：设置data.workers_per_gpu为CPU核心数的70-80%
提升精度的方法：
- 增加数据多样性
- 使用更强大的backbone（如ResNeXt-101）
- 调整anchor尺寸匹配你的物体大小
- 增加训练迭代次数

在实际项目中，我发现最耗时的部分往往是数据准备和标注阶段。一个实用的建议是：在开始大规模标注前，先标注少量样本（50-100张）进行快速验证，确保整个流程畅通无阻。这能避免后期发现数据格式或标注规范问题导致的大规模返工。

查看全文

http://www.jsqmd.com/news/680035/