当前位置：首页 > news >正文

yolov26改进 | 融合改进篇 | 利用尺度统一检测头DynamicHead融合P2增加小目标检测层（让小目标无所遁形）

news 2026/7/5 3:24:40

开始讲解之前推荐一下我的专栏，本专栏的内容支持(分类、检测、分割、追踪、关键点检测),专栏目前为限时折扣，欢迎大家订阅本专栏，本专栏每周更新3-5篇最新机制，更有包含我所有改进的文件和交流群提供给大家。

一、本文介绍

本文给大家带来的最新改进机制是针对性的改进，针对于小目标检测增加P2层，针对于大目标检测增加P6层利用DynamicHead(原版本一比一复现，全网独一份，不同于网上魔改版本)进行检测，其中我们增加P2层其拥有更高的分辨率，这使得模型能够更好地捕捉到小尺寸目标的细节。我们增加P6层是一个较低分辨率但具有更大感受野的特征层。对于大尺寸目标，这意味着模型可以更有效地捕捉到整体的结构信息。在这些的基础上我们配合DynamicHead可以使模型根据不同尺寸的目标动态调整其检测策略，进一步提升模型的精度。本文的内容是订阅专栏的读者提出来的，所以大家订阅专栏以后如果有感兴趣的机制均可指定。

欢迎大家订阅我的专栏一起学习YOLO！

专栏链接：YOLOv26有效涨点专栏包含：Conv、注意力机制、主干/Backbone、损失函数、优化器、后处理等改进机制

一、本文介绍

二、增加P2和P6层的好处

三、DynamicHead的核心代码

四、手把手教你添加DynamicHead检测头

4.1 修改一

4.2 修改二

4.3 修改三

4.4 修改四

4.5 修改五

4.6 修改六

4.7 修改七

4.8 修改八

五、DynamicHead检测头的yaml文件

5.1 DynamicHead和P2融合yaml文件

5.2 DynamicHead和P6融合yaml文件

六、完美运行记录

七、本文总结

二、增加P2和P6层的好处

我们增加P2和P6层是为了改进目标检测模型，特别是在处理不同大小目标的能力上。
1. 增加P2层的好处：
改善小目标检测：P2层通常有更高的分辨率，这使得模型能够更好地捕捉到小尺寸目标的细节。较高分辨率的特征图能够提供更多的空间信息，有助于检测小物体。
更精细的特征：由于P2层处于网络的较浅层，它能够捕捉到更多的细粒度特征，这对于理解小目标的形状和纹理非常重要。
2. 增加P6层的好处：
提升大目标检测性能：P6层是一个较低分辨率但具有更大感受野的特征层。对于大尺寸目标，这意味着模型可以更有效地捕捉到整体的结构信息。
降低计算复杂度：对于大目标，使用较低分辨率的特征图可以减少计算量，因为处理每个大目标需要的像素数较少。
3. 适应性能力的提升：
使用DynamicHead可以使模型根据不同尺寸的目标动态调整其检测策略，进一步提升模型的泛化能力和适应性，从而进一步提高精度。
总结：增加P2和P6层是为了让模型在处理不同尺寸的目标时更加高效和准确。这种策略特别适用于那些需要同时处理多种尺寸目标的应用场景的数据集，如街景图像分析、无人机视觉监控等。

三、DynamicHead的核心代码

代码的使用方式看章节四！

import torch.nn as nn import torch import math import copy from ultralytics.utils.torch_utils import TORCH_1_11 import torch.nn.functional as F from mmcv.ops import ModulatedDeformConv2d def _make_divisible(v, divisor, min_value=None): if min_value is None: min_value = divisor new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) # Make sure that round down does not go down by more than 10%. if new_v < 0.9 * v: new_v += divisor return new_v class h_swish(nn.Module): def __init__(self, inplace=False): super(h_swish, self).__init__() self.inplace = inplace def forward(self, x): return x * F.relu6(x + 3.0, inplace=self.inplace) / 6.0 class h_sigmoid(nn.Module): def __init__(self, inplace=True, h_max=1): super(h_sigmoid, self).__init__() self.relu = nn.ReLU6(inplace=inplace) self.h_max = h_max def forward(self, x): return self.relu(x + 3) * self.h_max / 6 class DYReLU(nn.Module): def __init__(self, inp, oup, reduction=4, lambda_a=1.0, K2=True, use_bias=True, use_spatial=False, init_a=[1.0, 0.0], init_b=[0.0, 0.0]): super(DYReLU, self).__init__() self.oup = oup self.lambda_a = lambda_a * 2 self.K2 = K2 self.avg_pool = nn.AdaptiveAvgPool2d(1) self.use_bias = use_bias if K2: self.exp = 4 if use_bias else 2 else: self.exp = 2 if use_bias else 1 self.init_a = init_a self.init_b = init_b # determine squeeze if reduction == 4: squeeze = inp // reduction else: squeeze = _make_divisible(inp // reduction, 4) # print('reduction: {}, squeeze: {}/{}'.format(reduction, inp, squeeze)) # print('init_a: {}, init_b: {}'.format(self.init_a, self.init_b)) self.fc = nn.Sequential( nn.Linear(inp, squeeze), nn.ReLU(inplace=True), nn.Linear(squeeze, oup * self.exp), h_sigmoid() ) if use_spatial: self.spa = nn.Sequential( nn.Conv2d(inp, 1, kernel_size=1), nn.BatchNorm2d(1), ) else: self.spa = None def forward(self, x): if isinstance(x, list): x_in = x[0] x_out = x[1] else: x_in = x x_out = x b, c, h, w = x_in.size() y = self.avg_pool(x_in).view(b, c) y = self.fc(y).view(b, self.oup * self.exp, 1, 1) if self.exp == 4: a1, b1, a2, b2 = torch.split(y, self.oup, dim=1) a1 = (a1 - 0.5) * self.lambda_a + self.init_a[0] # 1.0 a2 = (a2 - 0.5) * self.lambda_a + self.init_a[1] b1 = b1 - 0.5 + self.init_b[0] b2 = b2 - 0.5 + self.init_b[1] out = torch.max(x_out * a1 + b1, x_out * a2 + b2) elif self.exp == 2: if self.use_bias: # bias but not PL a1, b1 = torch.split(y, self.oup, dim=1) a1 = (a1 - 0.5) * self.lambda_a + self.init_a[0] # 1.0 b1 = b1 - 0.5 + self.init_b[0] out = x_out * a1 + b1 else: a1, a2 = torch.split(y, self.oup, dim=1) a1 = (a1 - 0.5) * self.lambda_a + self.init_a[0] # 1.0 a2 = (a2 - 0.5) * self.lambda_a + self.init_a[1] out = torch.max(x_out * a1, x_out * a2) elif self.exp == 1: a1 = y a1 = (a1 - 0.5) * self.lambda_a + self.init_a[0] # 1.0 out = x_out * a1 if self.spa: ys = self.spa(x_in).view(b, -1) ys = F.softmax(ys, dim=1).view(b, 1, h, w) * h * w ys = F.hardtanh(ys, 0, 3, inplace=True) / 3 out = out * ys return out class Conv3x3Norm(torch.nn.Module): def __init__(self, in_channels, out_channels, stride): super(Conv3x3Norm, self).__init__() self.conv = ModulatedDeformConv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1) self.bn = nn.GroupNorm(num_groups=16, num_channels=out_channels) def forward(self, input, **kwargs): x = self.conv(input.contiguous(), **kwargs) x = self.bn(x) return x class DyConv(nn.Module): def __init__(self, in_channels=256, out_channels=256, conv_func=Conv3x3Norm): super(DyConv, self).__init__() self.DyConv = nn.ModuleList() self.DyConv.append(conv_func(in_channels, out_channels, 1)) self.DyConv.append(conv_func(in_channels, out_channels, 1)) self.DyConv.append(conv_func(in_channels, out_channels, 2)) self.AttnConv = nn.Sequential( nn.AdaptiveAvgPool2d(1), nn.Conv2d(in_channels, 1, kernel_size=1), nn.ReLU(inplace=True)) self.h_sigmoid = h_sigmoid() self.relu = DYReLU(in_channels, out_channels) self.offset = nn.Conv2d(in_channels, 27, kernel_size=3, stride=1, padding=1) self.init_weights() def init_weights(self): for m in self.DyConv.modules(): if isinstance(m, nn.Conv2d): nn.init.normal_(m.weight.data, 0, 0.01) if m.bias is not None: m.bias.data.zero_() for m in self.AttnConv.modules(): if isinstance(m, nn.Conv2d): nn.init.normal_(m.weight.data, 0, 0.01) if m.bias is not None: m.bias.data.zero_() def forward(self, x): next_x = {} feature_names = list(x.keys()) for level, name in enumerate(feature_names): feature = x[name] offset_mask = self.offset(feature) offset = offset_mask[:, :18, :, :] mask = offset_mask[:, 18:, :, :].sigmoid() conv_args = dict(offset=offset, mask=mask) temp_fea = [self.DyConv[1](feature, **conv_args)] if level > 0: temp_fea.append(self.DyConv[2](x[feature_names[level - 1]], **conv_args)) if level < len(x) - 1: input = x[feature_names[level + 1]] temp_fea.append(F.interpolate(self.DyConv[0](input, **conv_args), size=[feature.size(2), feature.size(3)])) attn_fea = [] res_fea = [] for fea in temp_fea: res_fea.append(fea) attn_fea.append(self.AttnConv(fea)) res_fea = torch.stack(res_fea) spa_pyr_attn = self.h_sigmoid(torch.stack(attn_fea)) mean_fea = torch.mean(res_fea * spa_pyr_attn, dim=0, keepdim=False) next_x[name] = self.relu(mean_fea) return next_x def make_anchors(feats, strides, grid_cell_offset=0.5): """Generate anchors from features.""" anchor_points, stride_tensor = [], [] assert feats is not None dtype, device = feats[0].dtype, feats[0].device for i in range(len(feats)): # use len(feats) to avoid TracerWarning from iterating over strides tensor stride = strides[i] h, w = feats[i].shape[2:] if isinstance(feats, list) else (int(feats[i][0]), int(feats[i][1])) sx = torch.arange(end=w, device=device, dtype=dtype) + grid_cell_offset # shift x sy = torch.arange(end=h, device=device, dtype=dtype) + grid_cell_offset # shift y sy, sx = torch.meshgrid(sy, sx, indexing="ij") if TORCH_1_11 else torch.meshgrid(sy, sx) anchor_points.append(torch.stack((sx, sy), -1).view(-1, 2)) stride_tensor.append(torch.full((h * w, 1), stride, dtype=dtype, device=device)) return torch.cat(anchor_points), torch.cat(stride_tensor) def dist2bbox(distance, anchor_points, xywh=True, dim=-1): """Transform distance(ltrb) to box(xywh or xyxy).""" lt, rb = distance.chunk(2, dim) x1y1 = anchor_points - lt x2y2 = anchor_points + rb if xywh: c_xy = (x1y1 + x2y2) / 2 wh = x2y2 - x1y1 return torch.cat([c_xy, wh], dim) # xywh bbox return torch.cat((x1y1, x2y2), dim) # xyxy bbox def autopad(k, p=None, d=1): # kernel, padding, dilation """Pad to 'same' shape outputs.""" if d > 1: k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size if p is None: p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad return p class Conv(nn.Module): """Standard convolution module with batch normalization and activation. Attributes: conv (nn.Conv2d): Convolutional layer. bn (nn.BatchNorm2d): Batch normalization layer. act (nn.Module): Activation function layer. default_act (nn.Module): Default activation function (SiLU). """ default_act = nn.SiLU() # default activation def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True): """Initialize Conv layer with given parameters. Args: c1 (int): Number of input channels. c2 (int): Number of output channels. k (int): Kernel size. s (int): Stride. p (int, optional): Padding. g (int): Groups. d (int): Dilation. act (bool | nn.Module): Activation function. """ super().__init__() self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False) self.bn = nn.BatchNorm2d(c2) self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity() def forward(self, x): """Apply convolution, batch normalization and activation to input tensor. Args: x (torch.Tensor): Input tensor. Returns: (torch.Tensor): Output tensor. """ return self.act(self.bn(self.conv(x))) def forward_fuse(self, x): """Apply convolution and activation without batch normalization. Args: x (torch.Tensor): Input tensor. Returns: (torch.Tensor): Output tensor. """ return self.act(self.conv(x)) class DWConv(Conv): """Depth-wise convolution module.""" def __init__(self, c1, c2, k=1, s=1, d=1, act=True): """Initialize depth-wise convolution with given parameters. Args: c1 (int): Number of input channels. c2 (int): Number of output channels. k (int): Kernel size. s (int): Stride. d (int): Dilation. act (bool | nn.Module): Activation function. """ super().__init__(c1, c2, k, s, g=math.gcd(c1, c2), d=d, act=act) class DFL(nn.Module): """Integral module of Distribution Focal Loss (DFL). Proposed in Generalized Focal Loss https://ieeexplore.ieee.org/document/9792391 """ def __init__(self, c1: int = 16): """Initialize a convolutional layer with a given number of input channels. Args: c1 (int): Number of input channels. """ super().__init__() self.conv = nn.Conv2d(c1, 1, 1, bias=False).requires_grad_(False) x = torch.arange(c1, dtype=torch.float) self.conv.weight.data[:] = nn.Parameter(x.view(1, c1, 1, 1)) self.c1 = c1 def forward(self, x: torch.Tensor) -> torch.Tensor: """Apply the DFL module to input tensor and return transformed output.""" b, _, a = x.shape # batch, channels, anchors return self.conv(x.view(b, 4, self.c1, a).transpose(2, 1).softmax(1)).view(b, 4, a) # return self.conv(x.view(b, self.c1, 4, a).softmax(1)).view(b, 4, a) class DyHeadDetect(nn.Module): """YOLO Detect head for object detection models. This class implements the detection head used in YOLO models for predicting bounding boxes and class probabilities. It supports both training and inference modes, with optional end-to-end detection capabilities. Attributes: dynamic (bool): Force grid reconstruction. export (bool): Export mode flag. format (str): Export format. end2end (bool): End-to-end detection mode. max_det (int): Maximum detections per image. shape (tuple): Input shape. anchors (torch.Tensor): Anchor points. strides (torch.Tensor): Feature map strides. legacy (bool): Backward compatibility for v3/v5/v8/v9/v11 models. xyxy (bool): Output format, xyxy or xywh. nc (int): Number of classes. nl (int): Number of detection layers. reg_max (int): DFL channels. no (int): Number of outputs per anchor. stride (torch.Tensor): Strides computed during build. cv2 (nn.ModuleList): Convolution layers for box regression. cv3 (nn.ModuleList): Convolution layers for classification. dfl (nn.Module): Distribution Focal Loss layer. one2one_cv2 (nn.ModuleList): One-to-one convolution layers for box regression. one2one_cv3 (nn.ModuleList): One-to-one convolution layers for classification. Methods: forward: Perform forward pass and return predictions. bias_init: Initialize detection head biases. decode_bboxes: Decode bounding boxes from predictions. postprocess: Post-process model predictions. Examples: Create a detection head for 80 classes >>> detect = Detect(nc=80, ch=(256, 512, 1024)) >>> x = [torch.randn(1, 256, 80, 80), torch.randn(1, 512, 40, 40), torch.randn(1, 1024, 20, 20)] >>> outputs = detect(x) """ dynamic = False # force grid reconstruction export = False # export mode format = None # export format max_det = 300 # max_det agnostic_nms = False shape = None anchors = torch.empty(0) # init strides = torch.empty(0) # init legacy = False # backward compatibility for v3/v5/v8/v9 models xyxy = False # xyxy or xywh output def __init__(self, nc: int = 80, reg_max=16, end2end=False, ch: tuple = ()): """Initialize the YOLO detection layer with specified number of classes and channels. Args: nc (int): Number of classes. reg_max (int): Maximum number of DFL channels. end2end (bool): Whether to use end-to-end NMS-free detection. ch (tuple): Tuple of channel sizes from backbone feature maps. """ super().__init__() self.nc = nc # number of classes self.nl = len(ch) # number of detection layers self.reg_max = reg_max # DFL channels (ch[0] // 16 to scale 4/8/12/16/20 for n/s/m/l/x) self.no = nc + self.reg_max * 4 # number of outputs per anchor self.stride = torch.zeros(self.nl) # strides computed during build c2, c3 = max((16, ch[0] // 4, self.reg_max * 4)), max(ch[0], min(self.nc, 100)) # channels self.cv2 = nn.ModuleList( nn.Sequential(Conv(x, c2, 3), Conv(c2, c2, 3), nn.Conv2d(c2, 4 * self.reg_max, 1)) for x in ch ) self.cv3 = ( nn.ModuleList(nn.Sequential(Conv(x, c3, 3), Conv(c3, c3, 3), nn.Conv2d(c3, self.nc, 1)) for x in ch) if self.legacy else nn.ModuleList( nn.Sequential( nn.Sequential(DWConv(x, x, 3), Conv(x, c3, 1)), nn.Sequential(DWConv(c3, c3, 3), Conv(c3, c3, 1)), nn.Conv2d(c3, self.nc, 1), ) for x in ch ) ) self.dfl = DFL(self.reg_max) if self.reg_max > 1 else nn.Identity() dyhead_tower = [] for i in range(self.nl): channel = ch[i] dyhead_tower.append( DyConv( channel, channel, conv_func=Conv3x3Norm, ) ) self.add_module('dyhead_tower', nn.Sequential(*dyhead_tower)) if end2end: self.one2one_cv2 = copy.deepcopy(self.cv2) self.one2one_cv3 = copy.deepcopy(self.cv3) @property def one2many(self): """Returns the one-to-many head components, here for v3/v5/v8/v9/v11 backward compatibility.""" return dict(box_head=self.cv2, cls_head=self.cv3) @property def one2one(self): """Returns the one-to-one head components.""" return dict(box_head=self.one2one_cv2, cls_head=self.one2one_cv3) @property def end2end(self): """Checks if the model has one2one for v3/v5/v8/v9/v11 backward compatibility.""" return getattr(self, "_end2end", True) and hasattr(self, "one2one") @end2end.setter def end2end(self, value): """Override the end-to-end detection mode.""" self._end2end = value def forward_head( self, x: list[torch.Tensor], box_head: torch.nn.Module = None, cls_head: torch.nn.Module = None ) -> dict[str, torch.Tensor]: """Concatenates and returns predicted bounding boxes and class probabilities.""" if box_head is None or cls_head is None: # for fused inference return dict() bs = x[0].shape[0] # batch size boxes = torch.cat([box_head[i](x[i]).view(bs, 4 * self.reg_max, -1) for i in range(self.nl)], dim=-1) scores = torch.cat([cls_head[i](x[i]).view(bs, self.nc, -1) for i in range(self.nl)], dim=-1) return dict(boxes=boxes, scores=scores, feats=x) def forward( self, x: list[torch.Tensor] ) -> dict[str, torch.Tensor] | torch.Tensor | tuple[torch.Tensor, dict[str, torch.Tensor]]: """Concatenates and returns predicted bounding boxes and class probabilities.""" preds = self.forward_head(x, **self.one2many) if self.end2end: x_detach = [xi.detach() for xi in x] one2one = self.forward_head(x_detach, **self.one2one) preds = {"one2many": preds, "one2one": one2one} if self.training: return preds y = self._inference(preds["one2one"] if self.end2end else preds) if self.end2end: y = self.postprocess(y.permute(0, 2, 1)) return y if self.export else (y, preds) def _inference(self, x: dict[str, torch.Tensor]) -> torch.Tensor: """Decode predicted bounding boxes and class probabilities based on multiple-level feature maps. Args: x (dict[str, torch.Tensor]): Dictionary of predictions from detection layers. Returns: (torch.Tensor): Concatenated tensor of decoded bounding boxes and class probabilities. """ # Inference path dbox = self._get_decode_boxes(x) return torch.cat((dbox, x["scores"].sigmoid()), 1) def _get_decode_boxes(self, x: dict[str, torch.Tensor]) -> torch.Tensor: """Get decoded boxes based on anchors and strides.""" shape = x["feats"][0].shape # BCHW if self.dynamic or self.shape != shape: self.anchors, self.strides = (a.transpose(0, 1) for a in make_anchors(x["feats"], self.stride, 0.5)) self.shape = shape dbox = self.decode_bboxes(self.dfl(x["boxes"]), self.anchors.unsqueeze(0)) * self.strides return dbox def bias_init(self): """Initialize Detect() biases, WARNING: requires stride availability.""" for i, (a, b) in enumerate(zip(self.one2many["box_head"], self.one2many["cls_head"])): # from a[-1].bias.data[:] = 2.0 # box b[-1].bias.data[: self.nc] = math.log( 5 / self.nc / (640 / self.stride[i]) ** 2 ) # cls (.01 objects, 80 classes, 640 img) if self.end2end: for i, (a, b) in enumerate(zip(self.one2one["box_head"], self.one2one["cls_head"])): # from a[-1].bias.data[:] = 2.0 # box b[-1].bias.data[: self.nc] = math.log( 5 / self.nc / (640 / self.stride[i]) ** 2 ) # cls (.01 objects, 80 classes, 640 img) def decode_bboxes(self, bboxes: torch.Tensor, anchors: torch.Tensor, xywh: bool = True) -> torch.Tensor: """Decode bounding boxes from predictions.""" return dist2bbox( bboxes, anchors, xywh=xywh and not self.end2end and not self.xyxy, dim=1, ) def postprocess(self, preds: torch.Tensor) -> torch.Tensor: """Post-processes YOLO model predictions. Args: preds (torch.Tensor): Raw predictions with shape (batch_size, num_anchors, 4 + nc) with last dimension format [x1, y1, x2, y2, class_probs]. Returns: (torch.Tensor): Processed predictions with shape (batch_size, min(max_det, num_anchors), 6) and last dimension format [x1, y1, x2, y2, max_class_prob, class_index]. """ boxes, scores = preds.split([4, self.nc], dim=-1) scores, conf, idx = self.get_topk_index(scores, self.max_det) boxes = boxes.gather(dim=1, index=idx.repeat(1, 1, 4)) return torch.cat([boxes, scores, conf], dim=-1) def get_topk_index(self, scores: torch.Tensor, max_det: int) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor]: """Get top-k indices from scores. Args: scores (torch.Tensor): Scores tensor with shape (batch_size, num_anchors, num_classes). max_det (int): Maximum detections per image. Returns: (torch.Tensor, torch.Tensor, torch.Tensor): Top scores, class indices, and filtered indices. """ batch_size, anchors, nc = scores.shape # i.e. shape(16,8400,84) # Use max_det directly during export for TensorRT compatibility (requires k to be constant), # otherwise use min(max_det, anchors) for safety with small inputs during Python inference k = max_det if self.export else min(max_det, anchors) if self.agnostic_nms: scores, labels = scores.max(dim=-1, keepdim=True) scores, indices = scores.topk(k, dim=1) labels = labels.gather(1, indices) return scores, labels, indices ori_index = scores.max(dim=-1)[0].topk(k)[1].unsqueeze(-1) scores = scores.gather(dim=1, index=ori_index.repeat(1, 1, nc)) scores, index = scores.flatten(1).topk(k) idx = ori_index[torch.arange(batch_size)[..., None], index // nc] # original index return scores[..., None], (index % nc)[..., None].float(), idx def fuse(self) -> None: """Remove the one2many head for inference optimization.""" self.cv2 = self.cv3 = None

四、手把手教你添加DynamicHead检测头

4.1 修改一

首先我们将上面的代码复制粘贴到'ultralytics/nn' 目录下新建一个py文件复制粘贴进去，具体名字自己来定，我这里起名为DynamicHead.py。

4.2 修改二

第二步我们在该目录下创建一个新的py文件名字为'__init__.py'(用群内的文件的话已经有了无需新建)，然后在其内部导入我们的检测头如下图所示。

4.3 修改三

第三步我门中到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块(用群内的文件的话已经有了无需重新导入直接开始第四步即可)！

4.4 修改四

第四步我门找到如下文件'ultralytics/nn/tasks.py，找到如下的代码进行将检测头添加进去，这里给大家推荐个快速搜索的方法用ctrl+f然后搜索Detect然后就能快速查找了。

4.5 修改五

同理

4.6 修改六

同理

4.7 修改七

这里有一些不一样，我们需要加一行代码

else: return 'detect'

为啥呢不一样，因为这里的m在代码执行过程中会将你的代码自动转换为小写，所以直接else方便一点，以后出现一些其它分割或者其它的教程的时候在提供其它的修改教程。

4.8 修改八

同理.

到此就修改完成了，大家可以复制下面的yaml文件运行，
注意上面添加的步骤可能某一步你没修改对但是模型可以成功运行会出现模型精度为0或者无法收敛的情况。

五、DynamicHead检测头的yaml文件

5.1 DynamicHead和P2融合yaml文件

此版本训练信息：YOLO26-p2-DyHead summary: 394 layers, 5,099,768 parameters, 5,099,768 gradients, 7.5 GFLOPs

# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license # Ultralytics YOLO26 object detection model with P2/4 - P5/32 outputs # Model docs: https://docs.ultralytics.com/models/yolo26 # Task docs: https://docs.ultralytics.com/tasks/detect # Parameters nc: 80 # number of classes end2end: True # whether to use end-to-end mode reg_max: 1 # DFL bins scales: # model compound scaling constants, i.e. 'model=yolo26n-p2.yaml' will call yolo26-p2.yaml with scale 'n' # [depth, width, max_channels] n: [0.50, 0.25, 1024] # summary: 329 layers, 2,662,400 parameters, 2,662,400 gradients, 9.5 GFLOPs s: [0.50, 0.50, 1024] # summary: 329 layers, 9,765,856 parameters, 9,765,856 gradients, 27.8 GFLOPs m: [0.50, 1.00, 512] # summary: 349 layers, 21,144,288 parameters, 21,144,288 gradients, 91.4 GFLOPs l: [1.00, 1.00, 512] # summary: 489 layers, 25,815,520 parameters, 25,815,520 gradients, 115.3 GFLOPs x: [1.00, 1.50, 512] # summary: 489 layers, 57,935,232 parameters, 57,935,232 gradients, 256.9 GFLOPs # YOLO26n backbone backbone: # [from, repeats, module, args] - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2 - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4 - [-1, 2, C3k2, [256, False, 0.25]] - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8 - [-1, 2, C3k2, [512, False, 0.25]] - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16 - [-1, 2, C3k2, [512, True]] - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32 - [-1, 2, C3k2, [1024, True]] - [-1, 1, SPPF, [1024, 5, 3, True]] # 9 - [-1, 2, C2PSA, [1024]] # 10 # YOLO26n head head: - [-1, 1, nn.Upsample, [None, 2, "nearest"]] - [[-1, 6], 1, Concat, [1]] # cat backbone P4 - [-1, 2, C3k2, [512, True]] # 13 - [-1, 1, nn.Upsample, [None, 2, "nearest"]] - [[-1, 4], 1, Concat, [1]] # cat backbone P3 - [-1, 2, C3k2, [256, True]] # 16 (P3/8-small) - [-1, 1, nn.Upsample, [None, 2, "nearest"]] - [[-1, 2], 1, Concat, [1]] # cat backbone P2 - [-1, 2, C3k2, [128, True]] # 19 (P2/4-xsmall) - [-1, 1, Conv, [128, 3, 2]] - [[-1, 16], 1, Concat, [1]] # cat head P3 - [-1, 2, C3k2, [256, True]] # 22 (P3/8-small) - [-1, 1, Conv, [256, 3, 2]] - [[-1, 13], 1, Concat, [1]] # cat head P4 - [-1, 2, C3k2, [512, True]] # 25 (P4/16-medium) - [-1, 1, Conv, [512, 3, 2]] - [[-1, 10], 1, Concat, [1]] # cat head P5 - [-1, 1, C3k2, [1024, True, 0.5, True]] # 28 (P5/32-large) - [[19, 22, 25, 28], 1, DyHeadDetect, [nc]] # Detect(P2, P3, P4, P5)

5.2 DynamicHead和P6融合yaml文件

此版本训练信息：YOLO26-p6-DyHead summary: 414 layers, 7,610,432 parameters, 7,610,432 gradients, 5.7 GFLOPs

# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license # Ultralytics YOLO26 object detection model with P3/8 - P6/64 outputs # Model docs: https://docs.ultralytics.com/models/yolo26 # Task docs: https://docs.ultralytics.com/tasks/detect # Parameters nc: 80 # number of classes end2end: True # whether to use end-to-end mode reg_max: 1 # DFL bins scales: # model compound scaling constants, i.e. 'model=yolo26n-p6.yaml' will call yolo26-p6.yaml with scale 'n' # [depth, width, max_channels] n: [0.50, 0.25, 1024] # summary: 349 layers, 4,063,872 parameters, 4,063,872 gradients, 6.0 GFLOPs s: [0.50, 0.50, 1024] # summary: 349 layers, 15,876,448 parameters, 15,876,448 gradients, 22.3 GFLOPs m: [0.50, 1.00, 512] # summary: 369 layers, 32,400,096 parameters, 32,400,096 gradients, 77.3 GFLOPs l: [1.00, 1.00, 512] # summary: 523 layers, 39,365,600 parameters, 39,365,600 gradients, 97.0 GFLOPs x: [1.00, 1.50, 512] # summary: 523 layers, 88,330,368 parameters, 88,330,368 gradients, 216.6 GFLOPs # YOLO26n backbone backbone: # [from, repeats, module, args] - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2 - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4 - [-1, 2, C3k2, [256, False, 0.25]] - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8 - [-1, 2, C3k2, [512, False, 0.25]] - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16 - [-1, 2, C3k2, [512, True]] - [-1, 1, Conv, [768, 3, 2]] # 7-P5/32 - [-1, 2, C3k2, [768, True]] - [-1, 1, Conv, [1024, 3, 2]] # 9-P6/64 - [-1, 2, C3k2, [1024, True]] - [-1, 1, SPPF, [1024, 5]] # 11 - [-1, 2, C2PSA, [1024]] # 12 # YOLO26n head head: - [-1, 1, nn.Upsample, [None, 2, "nearest"]] - [[-1, 8], 1, Concat, [1]] # cat backbone P5 - [-1, 2, C3k2, [768, True]] # 15 - [-1, 1, nn.Upsample, [None, 2, "nearest"]] - [[-1, 6], 1, Concat, [1]] # cat backbone P4 - [-1, 2, C3k2, [512, True]] # 18 - [-1, 1, nn.Upsample, [None, 2, "nearest"]] - [[-1, 4], 1, Concat, [1]] # cat backbone P3 - [-1, 2, C3k2, [256, True]] # 21 (P3/8-small) - [-1, 1, Conv, [256, 3, 2]] - [[-1, 18], 1, Concat, [1]] # cat head P4 - [-1, 2, C3k2, [512, True]] # 24 (P4/16-medium) - [-1, 1, Conv, [512, 3, 2]] - [[-1, 15], 1, Concat, [1]] # cat head P5 - [-1, 2, C3k2, [768, True]] # 27 (P5/32-large) - [-1, 1, Conv, [768, 3, 2]] - [[-1, 12], 1, Concat, [1]] # cat head P6 - [-1, 1, C3k2, [1024, True, 0.5, True]] # 30 (P6/64-large) - [[21, 24, 27, 30], 1, DyHeadDetect, [nc]] # Detect(P3, P4, P5, P6)