当前位置：首页 > news >正文

UNet实战：用PyTorch从零搭建宠物分割模型（附OxfordIIITPet数据集处理技巧）

news 2026/3/26 21:10:48

UNet实战：从零构建宠物分割模型与OxfordIIITPet数据集深度解析

在计算机视觉领域，图像分割一直是核心挑战之一。不同于简单的分类任务，分割需要模型理解每个像素的语义信息。UNet作为医学影像分割的经典架构，因其独特的U型设计和跳跃连接机制，在各类分割任务中表现出色。本文将带您从零开始，用PyTorch实现一个完整的宠物分割系统，并深入探讨OxfordIIITPet数据集的处理技巧。

1. 环境准备与数据加载

1.1 基础环境配置

开始前需要确保已安装必要的Python库。推荐使用conda创建虚拟环境：

conda create -n unet python=3.8 conda activate unet pip install torch torchvision numpy matplotlib tqdm

对于GPU加速，建议安装对应CUDA版本的PyTorch。可以通过以下命令检查GPU是否可用：

import torch print(torch.cuda.is_available()) # 应输出True

1.2 OxfordIIITPet数据集详解

OxfordIIITPet数据集包含37类宠物图像，每张图片都有精细的像素级标注。数据集特点包括：

图像数量：7,349张（训练集3,680张，测试集3,669张）
标注类型：前景/背景/边界三值标注
平均分辨率：约500×400像素

常见数据问题及解决方案：

问题类型	表现	解决方法
标注偏移	掩码与图像不对齐	使用双线性插值(图像)+最近邻插值(掩码)
类别不平衡	背景像素远多于前景	采用Dice Loss或加权BCE Loss
边界模糊	边界区域标注不一致	将边界视为背景或单独类别

1.3 自定义数据集类实现

我们需要创建继承自torch.utils.data.Dataset的类来处理数据：

from torchvision.datasets import OxfordIIITPet from torchvision.transforms import InterpolationMode import torchvision.transforms.functional as TF class PetSegDataset(Dataset): def __init__(self, root, split="trainval", size=256): self.dataset = OxfordIIITPet( root=root, split=split, target_types="segmentation", download=True ) self.size = size def __getitem__(self, idx): img, mask = self.dataset[idx] # 保持图像质量的同时调整尺寸 img = TF.resize(img, (self.size, self.size), interpolation=InterpolationMode.BILINEAR) # 避免掩码插值产生无效值 mask = TF.resize(mask, (self.size, self.size), interpolation=InterpolationMode.NEAREST) img_tensor = TF.to_tensor(img) mask_tensor = torch.from_numpy(np.array(mask)) # 将标注转换为二值掩码：1(宠物)/0(背景) mask_tensor = (mask_tensor == 1).float().unsqueeze(0) return img_tensor, mask_tensor

2. UNet模型架构深度解析

2.1 核心组件实现

UNet的成功源于其精心设计的模块组合。我们先实现基础构建块：

import torch.nn as nn class DoubleConv(nn.Module): """双卷积块：Conv->BN->ReLU->Conv->BN->ReLU""" def __init__(self, in_ch, out_ch): super().__init__() self.conv = nn.Sequential( nn.Conv2d(in_ch, out_ch, 3, padding=1), nn.BatchNorm2d(out_ch), nn.ReLU(inplace=True), nn.Conv2d(out_ch, out_ch, 3, padding=1), nn.BatchNorm2d(out_ch), nn.ReLU(inplace=True) ) def forward(self, x): return self.conv(x)

2.2 下采样与上采样模块

下采样采用最大池化+双卷积的组合，而上采样则需要处理尺寸对齐问题：

class Down(nn.Module): """下采样：MaxPool -> DoubleConv""" def __init__(self, in_ch, out_ch): super().__init__() self.mpconv = nn.Sequential( nn.MaxPool2d(2), DoubleConv(in_ch, out_ch) ) def forward(self, x): return self.mpconv(x) class Up(nn.Module): """上采样：转置卷积 -> 跳跃连接 -> DoubleConv""" def __init__(self, in_ch, out_ch): super().__init__() self.up = nn.ConvTranspose2d(in_ch, in_ch//2, kernel_size=2, stride=2) self.conv = DoubleConv(in_ch, out_ch) def forward(self, x1, x2): x1 = self.up(x1) # 处理尺寸不匹配问题 diffY = x2.size()[2] - x1.size()[2] diffX = x2.size()[3] - x1.size()[3] x1 = F.pad(x1, [diffX // 2, diffX - diffX // 2, diffY // 2, diffY - diffY // 2]) x = torch.cat([x2, x1], dim=1) return self.conv(x)

2.3 完整UNet架构

整合各组件构建完整的U型网络：

class UNet(nn.Module): def __init__(self, in_ch=3, out_ch=1, base_ch=64): super(UNet, self).__init__() self.inc = DoubleConv(in_ch, base_ch) self.down1 = Down(base_ch, base_ch*2) self.down2 = Down(base_ch*2, base_ch*4) self.down3 = Down(base_ch*4, base_ch*8) self.down4 = Down(base_ch*8, base_ch*8) self.up1 = Up(base_ch*16, base_ch*4) self.up2 = Up(base_ch*8, base_ch*2) self.up3 = Up(base_ch*4, base_ch) self.up4 = Up(base_ch*2, base_ch) self.outc = nn.Conv2d(base_ch, out_ch, 1) def forward(self, x): x1 = self.inc(x) x2 = self.down1(x1) x3 = self.down2(x2) x4 = self.down3(x3) x5 = self.down4(x4) x = self.up1(x5, x4) x = self.up2(x, x3) x = self.up3(x, x2) x = self.up4(x, x1) logits = self.outc(x) return logits

3. 训练策略与优化技巧

3.1 混合损失函数设计

宠物分割面临的主要挑战是前景背景像素数量严重不平衡。我们组合两种损失：

def dice_loss(pred, target, smooth=1e-6): pred = torch.sigmoid(pred) intersection = (pred * target).sum() union = pred.sum() + target.sum() return 1 - (2. * intersection + smooth) / (union + smooth) def bce_dice_loss(pred, target, bce_weight=0.5): bce = F.binary_cross_entropy_with_logits(pred, target) dice = dice_loss(pred, target) return bce_weight * bce + (1 - bce_weight) * dice

3.2 训练循环实现

采用混合精度训练加速并减少显存占用：

from torch.cuda.amp import GradScaler, autocast def train_epoch(model, loader, optimizer, device, scaler=None): model.train() total_loss = 0 for images, masks in loader: images, masks = images.to(device), masks.to(device) optimizer.zero_grad() with autocast(enabled=scaler is not None): outputs = model(images) loss = bce_dice_loss(outputs, masks) if scaler: scaler.scale(loss).backward() scaler.step(optimizer) scaler.update() else: loss.backward() optimizer.step() total_loss += loss.item() return total_loss / len(loader)

3.3 学习率调度策略

采用余弦退火学习率调整：

from torch.optim.lr_scheduler import CosineAnnealingLR optimizer = torch.optim.Adam(model.parameters(), lr=1e-4) scheduler = CosineAnnealingLR(optimizer, T_max=10, eta_min=1e-6)

4. 模型评估与可视化

4.1 评估指标实现

除常规准确率外，分割任务更关注IoU和Dice系数：

def compute_metrics(pred, target, threshold=0.5): pred = (torch.sigmoid(pred) > threshold).float() target = target.float() intersection = (pred * target).sum() union = torch.logical_or(pred, target).sum() iou = (intersection + 1e-6) / (union + 1e-6) dice = (2 * intersection + 1e-6) / (pred.sum() + target.sum() + 1e-6) return iou.item(), dice.item()

4.2 结果可视化

直观展示模型预测效果：

def plot_results(model, loader, device, n_samples=3): model.eval() with torch.no_grad(): for i, (images, masks) in enumerate(loader): if i >= n_samples: break images, masks = images.to(device), masks.to(device) outputs = model(images) preds = torch.sigmoid(outputs) > 0.5 fig, axes = plt.subplots(1, 3, figsize=(15, 5)) axes[0].imshow(images[0].cpu().permute(1,2,0)) axes[0].set_title('Input') axes[0].axis('off') axes[1].imshow(masks[0].cpu().squeeze(), cmap='gray') axes[1].set_title('Ground Truth') axes[1].axis('off') axes[2].imshow(preds[0].cpu().squeeze(), cmap='gray') axes[2].set_title('Prediction') axes[2].axis('off') plt.show()