当前位置：首页 > news >正文

图像修复、超分、ViT都离不开它：深入浅出图解PyTorch Fold/Unfold的5个实战场景

news 2026/5/16 13:48:40

图像修复、超分、ViT都离不开它：深入浅出图解PyTorch Fold/Unfold的5个实战场景

在计算机视觉领域，PyTorch的Fold和Unfold操作就像瑞士军刀中的万能工具，虽然低调却能在关键时刻解决复杂问题。想象一下，当你需要处理图像块匹配、实现自定义卷积或构建Transformer输入时，这两个操作能让你从繁琐的循环代码中解放出来。本文将带您深入五个实际场景，看看这些"张量乐高积木"如何优雅地拼接出视觉任务的解决方案。

1. 非局部均值去噪：块匹配的艺术

非局部均值去噪的核心思想是利用图像中相似块的加权平均来消除噪声。传统实现需要嵌套循环遍历每个像素和其邻域，而Unfold让这一切变得高效：

import torch import torch.nn as nn def non_local_denoise(image, patch_size=3, search_window=7): # image: (1, C, H, W) unfold = nn.Unfold(kernel_size=patch_size, padding=patch_size//2) patches = unfold(image) # (1, C*p*p, H*W) # 计算块间相似度（简化版） patches_norm = patches / (patches.norm(dim=1, keepdim=True) + 1e-6) similarity = torch.matmul(patches_norm.transpose(1,2), patches_norm) # 加权平均 denoised = torch.matmul(similarity.softmax(dim=-1), patches.transpose(1,2)) # 还原图像 fold = nn.Fold(output_size=image.shape[2:], kernel_size=patch_size, padding=patch_size//2) return fold(denoised.transpose(1,2))

关键优势：

并行计算所有图像块相似度
避免Python循环带来的性能损失
保持与卷积操作一致的接口风格

2. 超分辨率重建：子像素卷积的逆过程

在ESPCN等超分网络中，PixelShuffle通过周期洗牌操作实现上采样。而Unfold可以看作是其逆向操作，将高分辨率图像分解为低分辨率块：

def prepare_hr_patches(hr_image, scale_factor=2): """为生成对抗训练准备HR图像块""" b, c, h, w = hr_image.shape unfold = nn.Unfold(kernel_size=scale_factor, stride=scale_factor) patches = unfold(hr_image) # (b, c*4, h*w/4) return patches.view(b, -1, h//scale_factor, w//scale_factor) # 与PixelShuffle的对应关系 hr_image = torch.randn(1, 3, 32, 32) lr_patches = prepare_hr_patches(hr_image) ps = nn.PixelShuffle(2) reconstructed = ps(lr_patches) # 近似原始HR图像

应用场景对比：

操作类型	输入维度	输出维度	典型用途
Unfold	(b,c,h,w)	(b,ckk,hw/(kk))	特征块提取
PixelShuffle	(b,crr,h,w)	(b,c,hr,wr)	亚像素上采样

3. Vision Transformer：图像分块的工程实现

ViT将图像划分为16x16的块作为Transformer的输入序列。使用Unfold可以高效实现这一过程：

class PatchEmbedding(nn.Module): def __init__(self, img_size=224, patch_size=16, in_chans=3, embed_dim=768): super().__init__() self.proj = nn.Sequential( nn.Unfold(kernel_size=patch_size, stride=patch_size), nn.Linear(in_chans * patch_size**2, embed_dim) ) self.num_patches = (img_size // patch_size) ** 2 def forward(self, x): x = self.proj(x) # (b, embed_dim, num_patches) return x.transpose(1, 2) # (b, num_patches, embed_dim)

与传统方法的对比：

手动切片：需要复杂的reshape和permute操作
Unfold实现：
- 自动处理边缘情况（padding）
- 支持dilation参数
- 与卷积参数完全兼容

4. 自定义卷积操作：超越标准卷积核

当需要实现空洞卷积或可变形卷积时，Unfold+手动偏移+Fold的组合提供了灵活的实现方案：

def deformable_conv2d(x, offset, kernel_size=3): """ x: input tensor (b,c,h,w) offset: 偏移量 (b,2*k*k,h,w) """ b, c, h, w = x.shape # 生成采样网格 grid = create_base_grid(h, w) + offset # 双线性采样 sampled = F.grid_sample(x, grid) # 展开为块表示 unfold = nn.Unfold(kernel_size=kernel_size) patches = unfold(sampled) # 自定义卷积核处理 output = apply_custom_kernel(patches) # 还原空间结构 fold = nn.Fold(output_size=(h,w), kernel_size=kernel_size) return fold(output)

性能优化技巧：

使用grid_sample实现亚像素级偏移
通过Unfold保持内存访问局部性
自定义核函数可替换为任意逐块操作

5. 数据增强：网格化图像重组

超越传统的裁剪翻转，Fold/Unfold能实现创新的数据增强方式：

class GridShuffleAugment: def __init__(self, grid_size=4): self.unfold = nn.Unfold(kernel_size=grid_size, stride=grid_size) self.fold = nn.Fold(output_size=(224,224), kernel_size=grid_size, stride=grid_size) def __call__(self, x): # x: (c,h,w) patches = self.unfold(x.unsqueeze(0)) # (1, c*g*g, n) patches = patches[:, :, torch.randperm(patches.size(2))] return self.fold(patches).squeeze(0) # 效果示例 augmenter = GridShuffleAugment() aug_img = augmenter(original_img) # 创建拼贴画风格图像

增强类型对比：