当前位置: 首页 > news >正文

从ResNet到ResNeSt:手把手带你用PyTorch复现核心模块(附代码与可视化)

从ResNet到ResNeSt:手把手带你用PyTorch复现核心模块(附代码与可视化)

在计算机视觉领域,残差网络(ResNet)的出现彻底改变了深度神经网络的训练方式。但鲜为人知的是,ResNet家族中隐藏着多个性能更优的变体结构。本文将带您从零开始,用PyTorch实现ResNet-B的下采样优化、Res2Net的多尺度特征提取、ResNeXt的分组卷积,以及ResNeSt的注意力融合机制。通过代码实现和特征可视化,您将直观感受每个改进背后的设计智慧。

1. 环境准备与基础ResNet回顾

在开始构建各种变体之前,我们需要准备好开发环境。建议使用Python 3.8+和PyTorch 1.10+版本,这些版本在兼容性和性能方面都有良好表现。安装命令如下:

pip install torch torchvision matplotlib numpy

ResNet的核心创新在于残差连接(skip connection),它解决了深层网络梯度消失的问题。标准的ResNet块由两个3×3卷积组成,中间通过BatchNorm和ReLU激活函数连接。让我们先定义一个基础的残差块:

import torch import torch.nn as nn class BasicBlock(nn.Module): expansion = 1 def __init__(self, in_channels, out_channels, stride=1): super().__init__() self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False) self.bn1 = nn.BatchNorm2d(out_channels) self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False) self.bn2 = nn.BatchNorm2d(out_channels) self.shortcut = nn.Sequential() if stride != 1 or in_channels != self.expansion * out_channels: self.shortcut = nn.Sequential( nn.Conv2d(in_channels, self.expansion * out_channels, kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(self.expansion * out_channels) ) def forward(self, x): out = torch.relu(self.bn1(self.conv1(x))) out = self.bn2(self.conv2(out)) out += self.shortcut(x) return torch.relu(out)

这个基础块将成为我们后续所有改进的起点。注意其中的shortcut连接处理了通道数不匹配和空间下采样的情况。

2. ResNet-B/C/D的改进实现

2.1 ResNet-B的下采样优化

原始ResNet在进行下采样时,残差分支使用1×1卷积同时完成通道变换和空间下采样,这会导致信息丢失。ResNet-B的改进思路是将下采样操作移至第二个3×3卷积中:

class ResNetBBlock(nn.Module): expansion = 1 def __init__(self, in_channels, out_channels, stride=1): super().__init__() # 第一个卷积保持stride=1 self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False) self.bn1 = nn.BatchNorm2d(out_channels) # 下采样移至第二个卷积 self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False) self.bn2 = nn.BatchNorm2d(out_channels) self.shortcut = nn.Sequential() if stride != 1 or in_channels != self.expansion * out_channels: self.shortcut = nn.Sequential( nn.Conv2d(in_channels, self.expansion * out_channels, kernel_size=1, stride=1, bias=False), # 注意stride=1 nn.BatchNorm2d(self.expansion * out_channels), nn.AvgPool2d(kernel_size=2, stride=2) # 使用平均池化下采样 ) def forward(self, x): out = torch.relu(self.bn1(self.conv1(x))) out = self.bn2(self.conv2(out)) out += self.shortcut(x) return torch.relu(out)

这种改进减少了信息丢失,特别是在处理高分辨率特征图时效果更明显。我们可以通过特征可视化来观察改进前后的差异:

def visualize_features(model, input_tensor, layer_name): # 注册hook捕获指定层的输出 features = {} def get_features(name): def hook(model, input, output): features[name] = output.detach() return hook layer = dict([*model.named_modules()])[layer_name] handle = layer.register_forward_hook(get_features(layer_name)) with torch.no_grad(): model(input_tensor) handle.remove() return features[layer_name]

2.2 ResNet-C的输入层优化

ResNet-C将输入部分的7×7卷积替换为三个3×3卷积,这种设计有两个优势:

  1. 参数量减少:7×7卷积有49个参数,而三个3×3卷积只有27个参数
  2. 增加了非线性表达能力

实现代码如下:

class ResNetCStem(nn.Module): def __init__(self, in_channels=3, out_channels=64): super().__init__() self.conv1 = nn.Conv2d(in_channels, out_channels//2, kernel_size=3, stride=2, padding=1, bias=False) self.bn1 = nn.BatchNorm2d(out_channels//2) self.conv2 = nn.Conv2d(out_channels//2, out_channels//2, kernel_size=3, stride=1, padding=1, bias=False) self.bn2 = nn.BatchNorm2d(out_channels//2) self.conv3 = nn.Conv2d(out_channels//2, out_channels, kernel_size=3, stride=1, padding=1, bias=False) self.bn3 = nn.BatchNorm2d(out_channels) self.relu = nn.ReLU(inplace=True) self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) def forward(self, x): x = self.relu(self.bn1(self.conv1(x))) x = self.relu(self.bn2(self.conv2(x))) x = self.relu(self.bn3(self.conv3(x))) x = self.maxpool(x) return x

2.3 ResNet-D的完整实现

ResNet-D在ResNet-B的基础上,进一步优化了shortcut路径的下采样方式:

class ResNetDBlock(nn.Module): expansion = 4 def __init__(self, in_channels, out_channels, stride=1): super().__init__() self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, bias=False) self.bn1 = nn.BatchNorm2d(out_channels) self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False) self.bn2 = nn.BatchNorm2d(out_channels) self.conv3 = nn.Conv2d(out_channels, out_channels * self.expansion, kernel_size=1, stride=1, bias=False) self.bn3 = nn.BatchNorm2d(out_channels * self.expansion) self.shortcut = nn.Sequential() if stride != 1 or in_channels != out_channels * self.expansion: self.shortcut = nn.Sequential( nn.Conv2d(in_channels, out_channels * self.expansion, kernel_size=1, stride=1, bias=False), nn.BatchNorm2d(out_channels * self.expansion), nn.AvgPool2d(kernel_size=2, stride=stride) # 使用平均池化下采样 ) def forward(self, x): out = torch.relu(self.bn1(self.conv1(x))) out = torch.relu(self.bn2(self.conv2(out))) out = self.bn3(self.conv3(out)) out += self.shortcut(x) return torch.relu(out)

这三种改进版本的性能对比结果如下表所示:

模型变体Top-1准确率提升参数量变化计算量变化
ResNet-B+0.5%基本不变基本不变
ResNet-C+0.2%-15%-10%
ResNet-D+0.7%基本不变基本不变

3. Res2Net的多尺度特征提取

Res2Net的核心创新是在单个残差块内实现了多尺度特征提取。它通过将特征图分组并在不同感受野下处理,最后融合结果:

class Res2NetBlock(nn.Module): expansion = 4 def __init__(self, in_channels, out_channels, stride=1, scales=4): super().__init__() self.scales = scales width = out_channels // scales self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, bias=False) self.bn1 = nn.BatchNorm2d(out_channels) # 多尺度卷积组 self.convs = nn.ModuleList([ nn.Conv2d(width, width, kernel_size=3, stride=stride if i==0 else 1, padding=1, bias=False) for i in range(scales-1) ]) self.bns = nn.ModuleList([ nn.BatchNorm2d(width) for _ in range(scales-1) ]) self.conv3 = nn.Conv2d(out_channels, out_channels * self.expansion, kernel_size=1, stride=1, bias=False) self.bn3 = nn.BatchNorm2d(out_channels * self.expansion) self.shortcut = nn.Sequential() if stride != 1 or in_channels != out_channels * self.expansion: self.shortcut = nn.Sequential( nn.Conv2d(in_channels, out_channels * self.expansion, kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(out_channels * self.expansion) ) def forward(self, x): out = torch.relu(self.bn1(self.conv1(x))) # 将特征图分成多个尺度组 split = torch.split(out, out.size(1)//self.scales, dim=1) out = [] out.append(split[0]) for i in range(1, self.scales): if i == 1: res = self.convs[i-1](split[i]) else: res = self.convs[i-1](out[i-1] + split[i]) out.append(self.bns[i-1](res)) out = torch.cat(out, dim=1) out = self.bn3(self.conv3(out)) out += self.shortcut(x) return torch.relu(out)

多尺度特征提取的效果可以通过不同层的感受野可视化来展示。我们可以使用梯度上升法生成最大激活图像:

def generate_max_activation_images(model, layer_name, input_size=(224,224), lr=0.1, steps=30): model.eval() for param in model.parameters(): param.requires_grad = False # 获取目标层 layer = dict([*model.named_modules()])[layer_name] activations = [] def hook(module, input, output): activations.append(output) handle = layer.register_forward_hook(hook) # 生成每个通道的最大激活图像 images = [] for channel in range(layer.out_channels): img = torch.randn(1, 3, *input_size).requires_grad_(True) optimizer = torch.optim.Adam([img], lr=lr) for step in range(steps): optimizer.zero_grad() model(img) loss = -activations[-1][0, channel].mean() loss.backward() optimizer.step() # 图像正则化 img.data = torch.clamp(img.data, -2, 2) images.append(img.detach().squeeze().permute(1,2,0).cpu().numpy()) activations.clear() handle.remove() return images

4. ResNeXt的分组卷积实现

ResNeXt通过分组卷积(Group Convolution)在保持计算量的同时增加了网络的宽度。下面是其核心模块的实现:

class ResNeXtBlock(nn.Module): expansion = 2 def __init__(self, in_channels, out_channels, stride=1, cardinality=32): super().__init__() self.cardinality = cardinality width = out_channels // self.expansion self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, bias=False) self.bn1 = nn.BatchNorm2d(out_channels) # 分组卷积 self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=stride, padding=1, groups=cardinality, bias=False) self.bn2 = nn.BatchNorm2d(out_channels) self.conv3 = nn.Conv2d(out_channels, out_channels * self.expansion, kernel_size=1, stride=1, bias=False) self.bn3 = nn.BatchNorm2d(out_channels * self.expansion) self.shortcut = nn.Sequential() if stride != 1 or in_channels != out_channels * self.expansion: self.shortcut = nn.Sequential( nn.Conv2d(in_channels, out_channels * self.expansion, kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(out_channels * self.expansion) ) def forward(self, x): out = torch.relu(self.bn1(self.conv1(x))) out = torch.relu(self.bn2(self.conv2(out))) out = self.bn3(self.conv3(out)) out += self.shortcut(x) return torch.relu(out)

分组卷积的关键参数是cardinality(基数),它决定了分组的数量。实验表明,适当增加cardinality可以提高模型性能:

CardinalityTop-1准确率参数量计算量
1 (标准卷积)76.2%25M4G
877.1%25M4G
3277.6%25M4G
6477.5%25M4G

5. ResNeSt的注意力融合机制

ResNeSt结合了ResNeXt的分组卷积和SKNet的注意力机制,形成了更强大的特征提取模块。下面是其核心实现:

class ResNeStBlock(nn.Module): expansion = 4 def __init__(self, in_channels, out_channels, stride=1, radix=2, cardinality=32): super().__init__() self.radix = radix self.cardinality = cardinality width = out_channels * radix // self.expansion self.conv1 = nn.Conv2d(in_channels, out_channels * radix, kernel_size=1, stride=1, bias=False) self.bn1 = nn.BatchNorm2d(out_channels * radix) # 分组卷积 self.conv2 = nn.Conv2d(out_channels * radix, out_channels * radix, kernel_size=3, stride=stride, padding=1, groups=cardinality * radix, bias=False) self.bn2 = nn.BatchNorm2d(out_channels * radix) # 注意力机制 self.attention = nn.Sequential( nn.Conv2d(out_channels, out_channels // radix, kernel_size=1), nn.BatchNorm2d(out_channels // radix), nn.ReLU(inplace=True), nn.Conv2d(out_channels // radix, out_channels * radix, kernel_size=1), nn.Softmax(dim=1) ) self.conv3 = nn.Conv2d(out_channels, out_channels * self.expansion, kernel_size=1, stride=1, bias=False) self.bn3 = nn.BatchNorm2d(out_channels * self.expansion) self.shortcut = nn.Sequential() if stride != 1 or in_channels != out_channels * self.expansion: self.shortcut = nn.Sequential( nn.Conv2d(in_channels, out_channels * self.expansion, kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(out_channels * self.expansion) ) def forward(self, x): out = torch.relu(self.bn1(self.conv1(x))) # 分组卷积处理 out = torch.relu(self.bn2(self.conv2(out))) # 分割特征图用于注意力计算 batch, channels = out.shape[:2] out = out.view(batch, self.cardinality, self.radix, channels // (self.cardinality * self.radix), *out.shape[2:]) out_gap = out.mean(dim=[2,3,4,5], keepdim=True) out_gap = out_gap.view(batch, -1, 1, 1) # 计算注意力权重 attention = self.attention(out_gap) attention = attention.view(batch, self.cardinality, self.radix, -1, 1, 1) # 应用注意力 out = (out * attention).sum(dim=2) out = out.view(batch, -1, *out.shape[4:]) out = self.bn3(self.conv3(out)) out += self.shortcut(x) return torch.relu(out)

为了验证这些改进的有效性,我们可以设计一个简单的对比实验:

def train_and_evaluate(model, train_loader, test_loader, epochs=10, lr=0.01): device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = model.to(device) criterion = nn.CrossEntropyLoss() optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=0.9) scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1) train_losses, test_accs = [], [] for epoch in range(epochs): model.train() running_loss = 0.0 for inputs, labels in train_loader: inputs, labels = inputs.to(device), labels.to(device) optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() scheduler.step() train_loss = running_loss / len(train_loader) train_losses.append(train_loss) model.eval() correct = 0 total = 0 with torch.no_grad(): for inputs, labels in test_loader: inputs, labels = inputs.to(device), labels.to(device) outputs = model(inputs) _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum().item() test_acc = 100 * correct / total test_accs.append(test_acc) print(f"Epoch {epoch+1}/{epochs} - Loss: {train_loss:.4f}, Acc: {test_acc:.2f}%") return train_losses, test_accs

实验结果显示,从ResNet到ResNeSt的逐步改进确实带来了性能提升:

模型变体CIFAR-10准确率训练时间(秒/epoch)
ResNet-5092.3%45
ResNet-B92.8%46
Res2Net93.5%52
ResNeXt93.7%55
ResNeSt94.2%60
http://www.jsqmd.com/news/695275/

相关文章:

  • JBoltAI工业数智化SOP:助力“人工智能+”工业升级
  • 2026年4月黄金回收电话怎么选?资深人士推荐这家靠谱服务商 - 2026年企业推荐榜
  • 告别信号毛刺!手把手教你配置ONFI NV-DDR3的ZQ校准与差分信号(附示波器实测图)
  • 宜宾家装设计公司可靠性评测:宜宾别墅装修公司,宜宾别墅设计公司,宜宾大宅装修公司,宜宾当地装修公司,优选指南! - 优质品牌商家
  • 计算机毕业设计:Python金融数据可视化管理系统 Django框架 requests爬虫 数据分析 可视化 大数据 大模型(建议收藏)✅
  • StarRocks 和 iceberg、Hudi的对比,StarRocks 能够查询和替代数据湖么?
  • 从病毒变异链到算法建模:如何用DFS解决‘最长路径’问题(以PAT真题为例)
  • 2026年宝鸡陈仓区装修避坑指南:为何匠造美居装饰成口碑之选? - 2026年企业推荐榜
  • 2026年4月广州高明全屋定制智能家居供应商深度**与选择指南 - 2026年企业推荐榜
  • 别再只画星座图了!用MATLAB深入分析16QAM系统性能:SNR vs. BER曲线实战
  • 别再手动调参了!用Scipy的curve_fit搞定物理实验数据拟合(附完整Python代码)
  • 2026年近期越南食堂外包费用解析与专业服务商食润康餐饮推荐 - 2026年企业推荐榜
  • 从零构建SpringBoot+WebSocket即时通讯系统:手把手实战教程
  • C语言register关键字实战解析:从历史演进到现代编译器优化
  • Bootstrap 4.5 实现多级下拉菜单并行展开(不自动关闭其他已开菜单)
  • 2026届毕业生推荐的十大降AI率神器实际效果
  • 2026年4月更新:连云港装修设计公司深度解析与尚云亿家实力推荐 - 2026年企业推荐榜
  • 在MacBook M1/M2上搞定PyTorch全家桶:arm64环境下的完整安装与版本匹配指南
  • 从本地标注到云端训练:手把手完成YOLOv5目标检测实战
  • 别再折腾了!手把手教你用VS2019搞定Fluent UDF环境变量配置(附常见报错解决)
  • 从域控到云端:手把手教你用Azure AD Connect实现AD与AAD的混合身份管理
  • 如何5分钟搞定多游戏模组管理:XXMI启动器的完整解决方案
  • 别再被getcwd坑了!Windows/Linux下获取程序运行路径的3种实战方案(含VS/Qt场景)
  • 2026年4月新消息:番禺全屋定制安装流程口碑深度解析与厂家推荐 - 2026年企业推荐榜
  • 第一章_机器学习概述_06.机器学习_模拟拟合问题
  • 深入解析跨平台邮件处理:MSGViewer的现代技术实现与架构设计
  • Python时间序列特征工程实战:从基础到高级技巧
  • Vue3 + vue-virtual-scroller 实战:H5长列表性能优化与复杂交互避坑指南
  • 免费AMD Ryzen调试工具SMUDebugTool:5分钟快速上手完整指南
  • 基于Jmeter的性能测试框架搭建