当前位置：首页 > news >正文

PyTorch二分类实战：BCEWithLogitsLoss的3个常见坑与解决方案

news 2026/3/27 0:14:36

PyTorch二分类实战：BCEWithLogitsLoss的3个常见坑与解决方案

最近在帮几个朋友调试他们的PyTorch二分类模型时，我发现一个有趣的现象：几乎每个初学者在使用BCEWithLogitsLoss时都会踩到相似的坑。这些坑看似简单，却能让模型训练完全失败，或者产生令人困惑的结果。今天我就把这些常见问题整理出来，结合实际的代码案例，分享给正在入门二分类任务的开发者们。

BCEWithLogitsLoss确实是PyTorch中处理二分类问题的利器——它将Sigmoid激活和二元交叉熵损失合二为一，既简化了代码，又提高了数值稳定性。但正是这种"一体化"的设计，让一些细节变得隐蔽，稍不注意就会出错。我见过太多人因为标签格式不对、权重设置错误或者不理解数值溢出的处理机制，导致模型训练了几天却毫无进展。

这篇文章不是简单的API文档翻译，而是基于我实际项目中积累的经验，针对那些官方文档没有明确强调、但实践中频繁出现的问题。无论你是刚开始接触深度学习，还是已经有一定经验但想深入了解损失函数的工作原理，相信都能从中获得实用的调试技巧。

1. 数值稳定性：为什么你的损失函数输出NaN或inf？

1.1 问题的本质：log(0)的灾难

让我们从一个真实的案例开始。上周有个朋友给我看他的代码，模型训练几轮后损失值突然变成了NaN。他的代码看起来没什么问题：

import torch import torch.nn as nn # 看起来正常的代码 model = nn.Linear(10, 1) criterion = nn.BCEWithLogitsLoss() optimizer = torch.optim.Adam(model.parameters(), lr=0.01) # 模拟训练过程 for epoch in range(100): inputs = torch.randn(32, 10) targets = torch.randint(0, 2, (32, 1)).float() outputs = model(inputs) loss = criterion(outputs, targets) if torch.isnan(loss): print(f"第{epoch}轮出现NaN!") break

问题出在哪里？关键在于BCEWithLogitsLoss内部的数值处理机制。这个损失函数的数学表达式是：

loss = max(logit, 0) - logit * target + log(1 + exp(-abs(logit)))

这个公式看起来复杂，其实是为了避免数值溢出而设计的优化版本。当logit的值非常大（正或负）时，直接计算sigmoid(logit)可能会接近0或1，导致log(0)的出现，从而产生无穷大。

注意：PyTorch的BCEWithLogitsLoss内部已经实现了数值稳定的计算方式，但这并不意味着我们可以完全忽视输入值的范围。

1.2 实战诊断：如何发现和修复数值问题

我通常用这个简单的诊断函数来检查模型输出：

def check_numerical_stability(logits, targets, threshold=100): """ 检查BCEWithLogitsLoss的数值稳定性问题 参数: logits: 模型原始输出 targets: 目标标签 threshold: 绝对值阈值，超过此值可能有问题 """ # 检查logits的范围 max_val = logits.abs().max().item() min_val = logits.abs().min().item() print(f"Logits绝对值范围: [{min_val:.4f}, {max_val:.4f}]") if max_val > threshold: print(f"⚠️ 警告: 发现绝对值大于{threshold}的logits，可能导致数值不稳定") # 检查sigmoid后的概率 with torch.no_grad(): probs = torch.sigmoid(logits) extreme_probs = ((probs < 1e-7) | (probs > 1 - 1e-7)).sum().item() if extreme_probs > 0: print(f"⚠️ 警告: 发现{extreme_probs}个极端概率值（接近0或1）") return max_val > threshold

在实际项目中，我遇到过几种常见的数值问题场景：

问题类型	典型表现	根本原因	解决方案
梯度爆炸	损失值突然变为NaN	学习率太大，权重更新过大	降低学习率，使用梯度裁剪
初始化不当	训练初期就出现NaN	权重初始化值过大	使用合适的初始化方法
数据异常	特定批次出现NaN	输入数据包含异常值	数据预处理，归一化

1.3 预防措施：三层防御策略

基于我的经验，我建议采用以下三层防御策略：

第一层：模型设计阶段

class StableBinaryClassifier(nn.Module): def __init__(self, input_dim, hidden_dim=128): super().__init__() # 使用合理的初始化 self.fc1 = nn.Linear(input_dim, hidden_dim) nn.init.kaiming_normal_(self.fc1.weight, nonlinearity='relu') self.fc2 = nn.Linear(hidden_dim, 1) nn.init.xavier_normal_(self.fc2.weight) # 添加BatchNorm有助于稳定训练 self.bn1 = nn.BatchNorm1d(hidden_dim) self.dropout = nn.Dropout(0.3) def forward(self, x): x = self.fc1(x) x = self.bn1(x) x = torch.relu(x) x = self.dropout(x) x = self.fc2(x) return x

第二层：训练过程中的监控

class TrainingMonitor: def __init__(self): self.logits_history = [] self.loss_history = [] def log_batch(self, logits, loss): self.logits_history.append(logits.detach().cpu()) self.loss_history.append(loss.item()) # 实时检查 if len(self.logits_history) % 100 == 0: recent_logits = torch.cat(self.logits_history[-100:]) stats = { 'mean': recent_logits.mean().item(), 'std': recent_logits.std().item(), 'max': recent_logits.max().item(), 'min': recent_logits.min().item() } if abs(stats['mean']) > 10 or stats['std'] > 5: print(f"⚠️ 异常统计: {stats}")

第三层：梯度管理

# 在训练循环中添加梯度裁剪 max_grad_norm = 1.0 # 根据任务调整 for epoch in range(num_epochs): for batch_idx, (inputs, targets) in enumerate(train_loader): optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, targets) loss.backward() # 梯度裁剪 torch.nn.utils.clip_grad_norm_(model.parameters(), max_grad_norm) optimizer.step()

2. 标签格式：那些让你困惑的形状和类型错误

2.1 形状不匹配：最常见的入门错误

我见过的最常见的错误就是标签形状不对。看看这个例子：

# 错误示例1：形状不匹配 logits = torch.randn(32, 1) # 形状: [batch_size, 1] targets = torch.randint(0, 2, (32,)) # 形状: [batch_size] criterion = nn.BCEWithLogitsLoss() loss = criterion(logits, targets) # ❌ 报错：形状不匹配

错误信息通常是：

RuntimeError: Using a target size (torch.Size([32])) that is different from the input size (torch.Size([32, 1]))

解决方案很简单：确保logits和targets的形状一致。

# 正确做法1：调整targets形状 logits = torch.randn(32, 1) # [batch_size, 1] targets = torch.randint(0, 2, (32, 1)).float() # [batch_size, 1] # 正确做法2：调整logits形状 logits = torch.randn(32) # [batch_size] targets = torch.randint(0, 2, (32,)).float() # [batch_size] criterion = nn.BCEWithLogitsLoss() loss = criterion(logits, targets) # ✅ 正确

2.2 数据类型陷阱：float还是long？

另一个常见问题是数据类型。BCEWithLogitsLoss要求targets必须是浮点型：

# 错误示例：使用整数标签 logits = torch.randn(32, 1) targets = torch.randint(0, 2, (32, 1)) # 默认是torch.int64 loss = criterion(logits, targets) # ❌ 可能不会报错，但结果错误！

这个问题很隐蔽，因为PyTorch有时不会直接报错，但计算结果会是错误的。正确的做法是：

# 正确做法：显式转换为float logits = torch.randn(32, 1) targets = torch.randint(0, 2, (32, 1)).float() # 关键：.float() # 或者使用正确的数据类型初始化 targets = torch.empty(32, 1).random_(2).float()

2.3 多标签分类的特殊情况

对于多标签分类（每个样本可以有多个正类），标签应该是二维的：

# 多标签分类示例 batch_size = 32 num_classes = 10 # 10个二分类问题 logits = torch.randn(batch_size, num_classes) # [32, 10] targets = torch.randint(0, 2, (batch_size, num_classes)).float() # [32, 10] criterion = nn.BCEWithLogitsLoss() loss = criterion(logits, targets) # 自动对每个类别计算损失并求平均

这里有一个重要的理解：BCEWithLogitsLoss默认会对所有维度的损失求平均。如果你需要不同的行为，可以通过reduction参数控制：

# 不同reduction参数的效果对比 criterion_mean = nn.BCEWithLogitsLoss(reduction='mean') # 默认，求平均 criterion_sum = nn.BCEWithLogitsLoss(reduction='sum') # 求和 criterion_none = nn.BCEWithLogitsLoss(reduction='none') # 不归约，返回每个元素的损失 logits = torch.randn(4, 3) targets = torch.randint(0, 2, (4, 3)).float() loss_mean = criterion_mean(logits, targets) # 标量 loss_sum = criterion_sum(logits, targets) # 标量 loss_none = criterion_none(logits, targets) # [4, 3]形状，每个元素一个损失

2.4 实用工具函数

我经常使用这个工具函数来确保标签格式正确：

def validate_and_fix_targets(logits, targets, task_type='binary'): """ 验证并修复targets的格式 参数: logits: 模型输出 targets: 目标标签 task_type: 'binary'或'multilabel' 返回: 修复后的targets """ original_shape = targets.shape # 检查数据类型 if targets.dtype != torch.float32 and targets.dtype != torch.float64: print(f"⚠️ 将targets从{targets.dtype}转换为float32") targets = targets.float() # 检查形状 if task_type == 'binary': # 对于二分类，logits和targets应该有相同形状 if logits.shape != targets.shape: print(f"⚠️ 形状不匹配: logits={logits.shape}, targets={targets.shape}") # 尝试自动修复 if len(logits.shape) == 2 and logits.shape[1] == 1: # logits是[batch, 1]，targets是[batch] if len(targets.shape) == 1: targets = targets.view(-1, 1) print(f" 已修复: targets -> {targets.shape}") elif len(logits.shape) == 1: # logits是[batch]，targets是[batch, 1] if len(targets.shape) == 2 and targets.shape[1] == 1: targets = targets.view(-1) print(f" 已修复: targets -> {targets.shape}") # 检查值范围 min_val = targets.min().item() max_val = targets.max().item() if min_val < 0 or max_val > 1: print(f"⚠️ targets值范围异常: [{min_val}, {max_val}]，应为[0, 1]") # 对于二分类，值应该是0或1 if task_type == 'binary': targets = (targets > 0.5).float() print(" 已将targets二值化") if targets.shape != original_shape: print(f"✅ targets形状已从{original_shape}修复为{targets.shape}") return targets

3. 权重设置：处理类别不平衡的正确姿势

3.1 理解pos_weight参数

类别不平衡是二分类任务中的常见问题。比如在医学诊断中，患病样本可能只占1%。如果直接训练，模型可能会倾向于把所有样本都预测为阴性（多数类）。

BCEWithLogitsLoss提供了pos_weight参数来处理这个问题。但很多人误解了这个参数的用法。让我用一个具体的例子来说明：

假设我们有一个数据集，正负样本比例为1:9（10%正样本，90%负样本）。理论上，pos_weight应该设置为9，因为负样本数量是正样本的9倍。

# 错误理解：直接设置pos_weight=9 criterion = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([9.0]))

但这样理解不够准确。pos_weight的实际作用是增加正样本在损失函数中的权重。数学上，损失函数变为：

loss = -[pos_weight * y * log(σ(x)) + (1-y) * log(1-σ(x))]

所以，当正样本较少时，我们确实应该给正样本更高的权重。但具体设置多少呢？

3.2 计算pos_weight的最佳实践

我通常使用这个公式：

def calculate_pos_weight(train_labels): """ 根据训练数据计算合适的pos_weight 参数: train_labels: 训练集的标签，形状为[batch_size, ...] 返回: pos_weight: 计算出的正样本权重 """ # 统计正负样本数量 if len(train_labels.shape) > 1: # 多标签情况 num_pos = (train_labels == 1).sum(dim=0).float() num_neg = (train_labels == 0).sum(dim=0).float() else: # 单标签二分类 num_pos = (train_labels == 1).sum().float() num_neg = (train_labels == 0).sum().float() # 避免除零 num_pos = torch.clamp(num_pos, min=1.0) num_neg = torch.clamp(num_neg, min=1.0) # 计算权重：负样本数 / 正样本数 pos_weight = num_neg / num_pos print(f"正样本数: {num_pos.item():.0f}") print(f"负样本数: {num_neg.item():.0f}") print(f"计算出的pos_weight: {pos_weight.item():.2f}") return pos_weight

但在实际使用中，我发现直接使用num_neg / num_pos有时会过于激进。特别是在极端不平衡的情况下（比如1:99），权重99可能会让模型过度关注少数类。

3.3 平滑权重策略

我更喜欢使用平滑后的权重：

def calculate_smoothed_pos_weight(train_labels, smoothing=0.1): """ 计算平滑后的pos_weight，避免极端值 参数: train_labels: 训练标签 smoothing: 平滑系数，越大权重越接近1 返回: 平滑后的pos_weight """ num_pos = (train_labels == 1).sum().float() num_neg = (train_labels == 0).sum().float() # 添加平滑 num_pos_smooth = num_pos + smoothing * (num_pos + num_neg) num_neg_smooth = num_neg + smoothing * (num_pos + num_neg) pos_weight = num_neg_smooth / num_pos_smooth return pos_weight

3.4 weight参数：样本级别的权重

除了pos_weight（类别级别的权重），BCEWithLogitsLoss还有一个weight参数，用于给每个样本分配不同的权重。这在以下场景很有用：

某些样本更重要（比如医疗数据中，某些病例更关键）
数据质量不均匀（某些样本的标签更可靠）
主动学习（给模型不确定的样本更高权重）

# 使用weight参数的示例 batch_size = 32 logits = torch.randn(batch_size, 1) targets = torch.randint(0, 2, (batch_size, 1)).float() # 假设我们有每个样本的权重（例如来自数据质量评分） sample_weights = torch.rand(batch_size, 1) * 2 # 权重在0-2之间 criterion = nn.BCEWithLogitsLoss(weight=sample_weights) loss = criterion(logits, targets)

3.5 综合示例：处理极端不平衡数据

让我分享一个真实项目的经验。当时我们处理一个欺诈检测任务，正样本（欺诈）只占0.1%。这是我们的解决方案：

class ImbalancedBinaryClassifier: def __init__(self, pos_weight_strategy='balanced', class_weight=None): """ 处理类别不平衡的二分类器 参数: pos_weight_strategy: 'balanced', 'inverse', 或 'custom' class_weight: 自定义类别权重 """ self.pos_weight_strategy = pos_weight_strategy self.class_weight = class_weight def prepare_loss_function(self, y_train): """根据训练数据准备损失函数""" if self.pos_weight_strategy == 'balanced': # 平衡权重 num_pos = (y_train == 1).sum().float() num_neg = (y_train == 0).sum().float() if num_pos > 0 and num_neg > 0: pos_weight = num_neg / num_pos else: pos_weight = torch.tensor([1.0]) elif self.pos_weight_strategy == 'inverse': # 逆频率权重（更温和） num_pos = (y_train == 1).sum().float() num_neg = (y_train == 0).sum().float() total = num_pos + num_neg pos_weight = torch.sqrt(num_neg / num_pos) if num_pos > 0 else torch.tensor([1.0]) elif self.pos_weight_strategy == 'custom' and self.class_weight is not None: pos_weight = self.class_weight else: pos_weight = None if pos_weight is not None: print(f"使用pos_weight: {pos_weight.item():.2f}") criterion = nn.BCEWithLogitsLoss(pos_weight=pos_weight) else: criterion = nn.BCEWithLogitsLoss() return criterion def train_with_focal_loss(self, model, train_loader, num_epochs=10): """ 使用Focal Loss变体训练，对难样本给予更多关注 Focal Loss: FL(p_t) = -α_t(1-p_t)^γ log(p_t) 这里我们实现一个简化版本 """ optimizer = torch.optim.Adam(model.parameters()) for epoch in range(num_epochs): total_loss = 0 for batch_idx, (inputs, targets) in enumerate(train_loader): optimizer.zero_grad() logits = model(inputs) # 计算sigmoid概率 probs = torch.sigmoid(logits) # 基础交叉熵 bce_loss = F.binary_cross_entropy_with_logits( logits, targets, reduction='none' ) # Focal Loss调整因子：对难样本（概率接近0.5）给予更高权重 # 这里pt = probs for positive, 1-probs for negative pt = probs * targets + (1 - probs) * (1 - targets) focal_weight = (1 - pt) ** 2 # γ=2 # 应用权重 loss = (focal_weight * bce_loss).mean() loss.backward() optimizer.step() total_loss += loss.item() print(f"Epoch {epoch+1}, Loss: {total_loss/len(train_loader):.4f}")

4. 调试技巧与最佳实践

4.1 完整的调试工作流

当你的BCEWithLogitsLoss出现问题时，我建议按照以下步骤排查：

class BCELossDebugger: def __init__(self): self.history = { 'losses': [], 'logits_stats': [], 'gradients': [] } def debug_batch(self, logits, targets, loss, model=None): """调试单个批次的损失计算""" print("=" * 50) print("BCEWithLogitsLoss 调试信息") print("=" * 50) # 1. 检查输入形状 print(f"1. 形状检查:") print(f" logits形状: {logits.shape}") print(f" targets形状: {targets.shape}") if logits.shape != targets.shape: print(f" ❌ 形状不匹配!") return False # 2. 检查数据类型 print(f"\n2. 数据类型检查:") print(f" logits类型: {logits.dtype}") print(f" targets类型: {targets.dtype}") if targets.dtype not in [torch.float32, torch.float64]: print(f" ❌ targets应该是float32或float64") return False # 3. 检查值范围 print(f"\n3. 值范围检查:") print(f" logits范围: [{logits.min().item():.4f}, {logits.max().item():.4f}]") print(f" targets范围: [{targets.min().item():.4f}, {targets.max().item():.4f}]") if targets.min() < 0 or targets.max() > 1: print(f" ⚠️ targets应该在[0, 1]范围内") # 4. 检查NaN/Inf print(f"\n4. NaN/Inf检查:") logits_nan = torch.isnan(logits).sum().item() logits_inf = torch.isinf(logits).sum().item() targets_nan = torch.isnan(targets).sum().item() targets_inf = torch.isinf(targets).sum().item() print(f" logits - NaN: {logits_nan}, Inf: {logits_inf}") print(f" targets - NaN: {targets_nan}, Inf: {targets_inf}") if any([logits_nan, logits_inf, targets_nan, targets_inf]): print(f" ❌ 发现NaN或Inf值!") return False # 5. 手动计算损失验证 print(f"\n5. 损失计算验证:") with torch.no_grad(): # 手动实现BCEWithLogitsLoss max_val = logits.clamp(min=0) loss_manual = max_val - logits * targets + torch.log1p(torch.exp(-logits.abs())) loss_manual = loss_manual.mean() print(f" PyTorch损失: {loss.item():.6f}") print(f" 手动计算损失: {loss_manual.item():.6f}") print(f" 差异: {abs(loss.item() - loss_manual.item()):.6e}") # 6. 梯度检查（如果提供了模型） if model is not None: print(f"\n6. 梯度检查:") # 计算梯度范数 total_norm = 0 for p in model.parameters(): if p.grad is not None: param_norm = p.grad.data.norm(2) total_norm += param_norm.item() ** 2 total_norm = total_norm ** 0.5 print(f" 梯度范数: {total_norm:.6f}") if total_norm > 100: print(f" ⚠️ 梯度可能爆炸") elif total_norm < 1e-6: print(f" ⚠️ 梯度可能消失") print(f"\n✅ 所有检查通过") return True

4.2 可视化工具

可视化是理解损失函数行为的关键。我经常使用这个工具来观察损失函数的变化：

import matplotlib.pyplot as plt import numpy as np def visualize_bce_with_logits(): """可视化BCEWithLogitsLoss对不同logits和targets的响应""" fig, axes = plt.subplots(2, 2, figsize=(12, 10)) # 1. 固定target=1，变化logits logits = torch.linspace(-10, 10, 100) targets = torch.ones_like(logits) criterion = nn.BCEWithLogitsLoss(reduction='none') losses = criterion(logits, targets) axes[0, 0].plot(logits.numpy(), losses.numpy()) axes[0, 0].set_xlabel('Logits') axes[0, 0].set_ylabel('Loss') axes[0, 0].set_title('Target = 1') axes[0, 0].grid(True) # 2. 固定target=0，变化logits targets = torch.zeros_like(logits) losses = criterion(logits, targets) axes[0, 1].plot(logits.numpy(), losses.numpy()) axes[0, 1].set_xlabel('Logits') axes[0, 1].set_ylabel('Loss') axes[0, 1].set_title('Target = 0') axes[0, 1].grid(True) # 3. 固定logits=0，变化targets logits = torch.zeros(100) targets = torch.linspace(0, 1, 100) losses = criterion(logits, targets) axes[1, 0].plot(targets.numpy(), losses.numpy()) axes[1, 0].set_xlabel('Targets') axes[1, 0].set_ylabel('Loss') axes[1, 0].set_title('Logits = 0') axes[1, 0].grid(True) # 4. 3D可视化：logits和targets都变化 logits_2d = torch.linspace(-5, 5, 50) targets_2d = torch.linspace(0, 1, 50) logits_grid, targets_grid = torch.meshgrid(logits_2d, targets_2d) losses_2d = criterion(logits_grid.flatten(), targets_grid.flatten()) losses_2d = losses_2d.view(50, 50) im = axes[1, 1].imshow(losses_2d.numpy(), extent=[0, 1, -5, 5], aspect='auto', origin='lower', cmap='viridis') axes[1, 1].set_xlabel('Targets') axes[1, 1].set_ylabel('Logits') axes[1, 1].set_title('Loss Surface') plt.colorbar(im, ax=axes[1, 1]) plt.tight_layout() plt.show() # 使用示例 visualize_bce_with_logits()

4.3 性能优化技巧

在处理大规模数据时，BCEWithLogitsLoss的性能也很重要。这里有几个优化建议：

class OptimizedBCETraining: def __init__(self, use_amp=True, gradient_accumulation_steps=1): """ 优化的BCE训练类 参数: use_amp: 是否使用自动混合精度 gradient_accumulation_steps: 梯度累积步数 """ self.use_amp = use_amp self.gradient_accumulation_steps = gradient_accumulation_steps if use_amp: self.scaler = torch.cuda.amp.GradScaler() def train_step(self, model, batch, criterion, optimizer, device): """单步训练，支持混合精度和梯度累积""" inputs, targets = batch inputs = inputs.to(device) targets = targets.to(device) # 混合精度训练 if self.use_amp: with torch.cuda.amp.autocast(): outputs = model(inputs) loss = criterion(outputs, targets) / self.gradient_accumulation_steps self.scaler.scale(loss).backward() else: outputs = model(inputs) loss = criterion(outputs, targets) / self.gradient_accumulation_steps loss.backward() return loss.item() * self.gradient_accumulation_steps def optimizer_step(self, model, optimizer): """执行优化器步骤""" if self.use_amp: self.scaler.step(optimizer) self.scaler.update() else: optimizer.step() optimizer.zero_grad()

4.4 与其他损失函数的对比

在实际项目中，我经常需要根据任务特点选择不同的损失函数。这里是一个对比表格：

损失函数	适用场景	优点	缺点	注意事项
`BCEWithLogitsLoss`	标准二分类	数值稳定，使用方便	对极端不平衡数据敏感	注意标签格式和数据类型
`BCELoss`	需要自定义Sigmoid	更灵活	需要手动处理数值稳定性	确保输入在(0,1)范围内
`CrossEntropyLoss`	多分类或二分类（输出2维）	PyTorch优化好	不适用于多标签分类	输出维度应为类别数
`Focal Loss`	类别不平衡或难样本	关注难样本	需要调参	γ参数需要调整
`Dice Loss`	图像分割中的二分类	对类别不平衡鲁棒	可能训练不稳定	结合其他损失使用

选择建议：

大多数标准二分类任务：BCEWithLogitsLoss
需要自定义激活函数：BCELoss
极度类别不平衡：Focal Loss或带合适pos_weight的BCEWithLogitsLoss
图像分割：Dice Loss+BCEWithLogitsLoss

4.5 实际项目中的经验总结

在我最近的一个项目中，我们使用BCEWithLogitsLoss处理医疗图像分类。数据集有严重的类别不平衡（正样本仅占3%），同时还需要处理多标签分类。这是我们的最终配置：

class MedicalImageClassifier(nn.Module): def __init__(self, num_classes, class_weights=None): super().__init__() self.backbone = models.resnet34(pretrained=True) num_features = self.backbone.fc.in_features self.backbone.fc = nn.Linear(num_features, num_classes) # 根据类别频率设置权重 if class_weights is not None: self.criterion = nn.BCEWithLogitsLoss( pos_weight=class_weights, reduction='mean' ) else: self.criterion = nn.BCEWithLogitsLoss() # 添加标签平滑 self.label_smoothing = 0.1 def forward(self, x): return self.backbone(x) def compute_loss(self, outputs, targets): """计算损失，包含标签平滑""" if self.label_smoothing > 0: # 标签平滑：将硬标签转换为软标签 targets = targets * (1 - self.label_smoothing) + 0.5 * self.label_smoothing loss = self.criterion(outputs, targets) # 添加L2正则化 l2_lambda = 0.0001 l2_norm = sum(p.pow(2.0).sum() for p in self.parameters()) loss = loss + l2_lambda * l2_norm return loss def predict_with_confidence(self, x, threshold=0.5): """预测并返回置信度""" with torch.no_grad(): logits = self.forward(x) probs = torch.sigmoid(logits) # 计算置信度（基于概率与阈值的距离） confidence = torch.abs(probs - threshold) * 2 # 归一化到[0,1] predictions = (probs > threshold).float() return predictions, probs, confidence

这个实现有几个关键点：

使用预训练的ResNet作为骨干网络
根据类别频率设置pos_weight
添加标签平滑防止过拟合
包含L2正则化
预测时返回置信度

训练过程中，我们还使用了学习率预热和余弦退火：

def create_optimizer_and_scheduler(model, train_loader, num_epochs): """创建优化器和学习率调度器""" optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4, weight_decay=0.01) # 学习率预热 warmup_epochs = 5 warmup_scheduler = torch.optim.lr_scheduler.LinearLR( optimizer, start_factor=0.01, end_factor=1.0, total_iters=len(train_loader) * warmup_epochs ) # 余弦退火 cosine_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR( optimizer, T_max=len(train_loader) * (num_epochs - warmup_epochs) ) # 组合调度器 from torch.optim.lr_scheduler import SequentialLR scheduler = SequentialLR( optimizer, schedulers=[warmup_scheduler, cosine_scheduler], milestones=[len(train_loader) * warmup_epochs] ) return optimizer, scheduler

通过这些优化，我们的模型在极端不平衡的数据集上达到了92%的F1分数，比基线模型提高了15%。关键是要理解BCEWithLogitsLoss的每个参数如何影响训练过程，并根据具体任务进行调整。

查看全文

http://www.jsqmd.com/news/455376/