当前位置：首页 > news >正文

YOLO V8-Detection 【批量图片推理】推理详解及部署实现

news 2026/7/18 23:10:10

前言

在实际处理过程中，我们使用YOLO V8进行推理时，通常会针对一张图片进行推理。如果需要对多张图片进行推理，则可以通过一个循环来实现对图片逐张进行推理。

单张图片推理时，需要注意图片的尺寸必须是32的倍数，否则可能导致推理失败。在下面的示例中，我们展示了如何使用PyTorch和Ultralytics库进行单张图片的推理：

import torch from ultralytics import YOLO # Load a pretrained YOLOv8n model model = YOLO('yolov8n.pt') # Create a random torch tensor of BCHW shape (1, 3, 640, 640) with values in range [0, 1] and type float32 source = torch.rand(1, 3, 640, 640, dtype=torch.float32) # Run inference on the source results = model(source) # list of Results objects

批量图片推理时，也需要注意图片的尺寸必须是32的倍数。在下面的示例中，我们展示了如何使用PyTorch和Ultralytics库进行多张图片的批量推理：

import torch from ultralytics import YOLO # Load a pretrained YOLOv8n model model = YOLO('yolov8n.pt') # Create a random torch tensor of BCHW shape (1, 3, 640, 640) with values in range [0, 1] and type float32 source = torch.rand(4, 3, 640, 640, dtype=torch.float32) # Run inference on the source results = model(source) # list of Results objects

需要注意的是，在批量推理时，虽然一次推理了多张图片，但实际处理方式仍然是通过循环进行的。在下面的文章中，我们将介绍如何使用更高效的方式进行批量推理，以获得更快的推理速度和更好的性能。

下面我们介绍如何将【单张图片推理】检测代码给修改成【批量图片推理】代码，进行批量推理。

一、批量推理的前处理

原始代码

@staticmethod def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), scaleup=True, stride=32): # Resize and pad image while meeting stride-multiple constraints shape = im.shape[:2] # current shape [height, width] if isinstance(new_shape, int): new_shape = (new_shape, new_shape) # Scale ratio (new / old) r = min(new_shape[0] / shape[0], new_shape[1] / shape[1]) if not scaleup: # only scale down, do not scale up (for better val mAP) r = min(r, 1.0) # Compute padding ratio = r, r # width, height ratios new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r)) dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # wh padding # minimum rectangle dw, dh = np.mod(dw, stride), np.mod(dh, stride) # wh padding dw /= 2 # divide padding into 2 sides dh /= 2 if shape[::-1] != new_unpad: # resize im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR) top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1)) left, right = int(round(dw - 0.1)), int(round(dw + 0.1)) im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # add border return im, ratio, (dw, dh) def precess_image(self, img_src, img_size, half, device): # Padded resize img = self.letterbox(img_src, img_size)[0] # Convert img = img.transpose((2, 0, 1))[::-1] # HWC to CHW, BGR to RGB img = np.ascontiguousarray(img) img = torch.from_numpy(img).to(device) img = img.half() if half else img.float() # uint8 to fp16/32 img = img / 255 # 0 - 255 to 0.0 - 1.0 if len(img.shape) == 3: img = img[None] # expand for batch dim return img

处理方式

我们要先知道在原始处理方式中是如何操作的：

它包含以下步骤：

self.pre_transform：即 letterbox 添加灰条
img.transpose((2, 0, 1))[::-1]：HWC to CHW, BGR to RGB
torch.from_numpy：to Tensor
img.float()：uint8 to fp32
im /= 255：除以 255，归一化
img[None]：增加维度

在上述处理过程中我们最主要进行修改的就是self.pre_transform里面的操作，其余部分都是可以直接进行批量操作的。

在letterbox中最主要的操作就是下面两个函数，使用 opencv 进行实现的。我们要进行批量操作，那么 opencv 库是不能实现的，进行批量操作一般会用广播机制或者 tensor操作来实现。

im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR) im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)

由于最终输入到模型里面的是一个tensor，所以在这里我们使用 tensor的操作方式进行实现。

尺寸修改

原始方法： im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR) 现在方法： resized_tensor = F.interpolate(image_tensor, size=new_unpad, mode='bilinear', align_corners=False)

两者的实现效果：

原始方法：(1176, 1956, 3) --》(385, 640, 3)

现在方法：(1176, 1956, 3) --》(385, 640, 3)

添加边框

原始方式： im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) 现在方法： padded_tensor = F.pad(resized_tensor, (top, bottom, left, right), mode='constant', value=padding_value)

两者的实现效果：

原始方法：(385, 640, 3) --》(416, 640, 3)

现在方法：(385, 640, 3) --》(416, 640, 3)

修改后的代码

def tensor_process(self, image_cv): img_shape = image_cv.shape[1:] new_shape = [640, 640] r = min(new_shape[0] / img_shape[0], new_shape[1] / img_shape[1]) # Compute padding new_unpad = int(round(img_shape[0] * r)), int(round(img_shape[1] * r)) dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # wh padding dw, dh = np.mod(dw, 32), np.mod(dh, 32) # wh padding dw /= 2 # divide padding into 2 sides dh /= 2 top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1)) left, right = int(round(dw - 0.1)), int(round(dw + 0.1)) padding_value = 114 image_tensor = torch.from_numpy(image_cv).permute(0, 3, 1, 2).float() image_tensor = image_tensor.to(self.device) resized_tensor = F.interpolate(image_tensor, size=new_unpad, mode='bilinear', align_corners=False) padded_tensor = F.pad(resized_tensor, (top, bottom, left, right), mode='constant', value=padding_value) infer_tensor = padded_tensor / 255.0 return infer_tensor

二、批量推理的后处理

原始代码

def non_max_suppression( prediction, conf_thres=0.25, iou_thres=0.45, classes=None, agnostic=False, multi_label=False, labels=(), max_det=300, nc=0, # number of classes (optional) max_time_img=0.05, max_nms=30000, max_wh=7680, rotated=False, ): """ Perform non-maximum suppression (NMS) on a set of boxes, with support for masks and multiple labels per box. Args: prediction (torch.Tensor): A tensor of shape (batch_size, num_classes + 4 + num_masks, num_boxes) containing the predicted boxes, classes, and masks. The tensor should be in the format output by a model, such as YOLO. conf_thres (float): The confidence threshold below which boxes will be filtered out. Valid values are between 0.0 and 1.0. iou_thres (float): The IoU threshold below which boxes will be filtered out during NMS. Valid values are between 0.0 and 1.0. classes (List[int]): A list of class indices to consider. If None, all classes will be considered. agnostic (bool): If True, the model is agnostic to the number of classes, and all classes will be considered as one. multi_label (bool): If True, each box may have multiple labels. labels (List[List[Union[int, float, torch.Tensor]]]): A list of lists, where each inner list contains the apriori labels for a given image. The list should be in the format output by a dataloader, with each label being a tuple of (class_index, x1, y1, x2, y2). max_det (int): The maximum number of boxes to keep after NMS. nc (int, optional): The number of classes output by the model. Any indices after this will be considered masks. max_time_img (float): The maximum time (seconds) for processing one image. max_nms (int): The maximum number of boxes into torchvision.ops.nms(). max_wh (int): The maximum box width and height in pixels Returns: (List[torch.Tensor]): A list of length batch_size, where each element is a tensor of shape (num_boxes, 6 + num_masks) containing the kept boxes, with columns (x1, y1, x2, y2, confidence, class, mask1, mask2, ...). """ # Checks assert 0 <= conf_thres <= 1, f"Invalid Confidence threshold {conf_thres}, valid values are between 0.0 and 1.0" assert 0 <= iou_thres <= 1, f"Invalid IoU {iou_thres}, valid values are between 0.0 and 1.0" if isinstance(prediction, (list, tuple)): # YOLOv8 model in validation model, output = (inference_out, loss_out) prediction = prediction[0] # select only inference output bs = prediction.shape[0] # batch size nc = nc or (prediction.shape[1] - 4) # number of classes nm = prediction.shape[1] - nc - 4 mi = 4 + nc # mask start index xc = prediction[:, 4:mi].amax(1) > conf_thres # candidates # Settings # min_wh = 2 # (pixels) minimum box width and height time_limit = 0.5 + max_time_img * bs # seconds to quit after multi_label &= nc > 1 # multiple labels per box (adds 0.5ms/img) prediction = prediction.transpose(-1, -2) # shape(1,84,6300) to shape(1,6300,84) if not rotated: prediction[..., :4] = xywh2xyxy(prediction[..., :4]) # xywh to xyxy t = time.time() output = [torch.zeros((0, 6 + nm), device=prediction.device)] * bs for xi, x in enumerate(prediction): # image index, image inference # Apply constraints # x[((x[:, 2:4] < min_wh) | (x[:, 2:4] > max_wh)).any(1), 4] = 0 # width-height x = x[xc[xi]] # confidence # Cat apriori labels if autolabelling if labels and len(labels[xi]) and not rotated: lb = labels[xi] v = torch.zeros((len(lb), nc + nm + 4), device=x.device) v[:, :4] = xywh2xyxy(lb[:, 1:5]) # box v[range(len(lb)), lb[:, 0].long() + 4] = 1.0 # cls x = torch.cat((x, v), 0) # If none remain process next image if not x.shape[0]: continue # Detections matrix nx6 (xyxy, conf, cls) box, cls, mask = x.split((4, nc, nm), 1) if multi_label: i, j = torch.where(cls > conf_thres) x = torch.cat((box[i], x[i, 4 + j, None], j[:, None].float(), mask[i]), 1) else: # best class only conf, j = cls.max(1, keepdim=True) x = torch.cat((box, conf, j.float(), mask), 1)[conf.view(-1) > conf_thres] # Filter by class if classes is not None: x = x[(x[:, 5:6] == torch.tensor(classes, device=x.device)).any(1)] # Check shape n = x.shape[0] # number of boxes if not n: # no boxes continue if n > max_nms: # excess boxes x = x[x[:, 4].argsort(descending=True)[:max_nms]] # sort by confidence and remove excess boxes # Batched NMS c = x[:, 5:6] * (0 if agnostic else max_wh) # classes scores = x[:, 4] # scores if rotated: boxes = torch.cat((x[:, :2] + c, x[:, 2:4], x[:, -1:]), dim=-1) # xywhr i = nms_rotated(boxes, scores, iou_thres) else: boxes = x[:, :4] + c # boxes (offset by class) i = torchvision.ops.nms(boxes, scores, iou_thres) # NMS i = i[:max_det] # limit detections # # Experimental # merge = False # use merge-NMS # if merge and (1 < n < 3E3): # Merge NMS (boxes merged using weighted mean) # # Update boxes as boxes(i,4) = weights(i,n) * boxes(n,4) # from .metrics import box_iou # iou = box_iou(boxes[i], boxes) > iou_thres # iou matrix # weights = iou * scores[None] # box weights # x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True) # merged boxes # redundant = True # require redundant detections # if redundant: # i = i[iou.sum(1) > 1] # require redundancy output[xi] = x[i] if (time.time() - t) > time_limit: LOGGER.warning(f"WARNING ⚠️ NMS time limit {time_limit:.3f}s exceeded") break # time limit exceeded return output

处理方式

我们要先知道在原始处理方式中是如何操作的。

在这个里面，最主要的操作就是nms 操作，这里的 nms 操作就是一张图一张图的结果进行处理，不是多张图的结果一起处理，我们最主要的就是要修改这里的代码。

但是在这里，我们要先理解在原始的处理方式是怎样的逻辑。

原始 nms 逻辑

在这里就只给出关键步骤：

计算第4列到第mi列中的最大值，然后将这个最大值与conf_thres进行比较，得到一个布尔值结果。最终的输出是一个布尔张量，表示每一行是否存在大于conf_thres的最大值。

xc = prediction[:, 4:mi].amax(1) > conf_thres # candidates

将原始预测框中的 xywh 转为 xyxy

prediction[..., :4] = xywh2xyxy(prediction[..., :4])

从原始结果中选择出为True的结果，得到初步的筛选结果

x = x[xc[xi]] # confidence

分离出标注框，类别，掩码

box, cls, mask = x.split((4, nc, nm), 1)

再次根据 cls 进行筛选，并拼接成新的推理结果

conf, j = cls.max(1, keepdim=True) x = torch.cat((box, conf, j.float(), mask), 1)[conf.view(-1) > conf_thres]

计算nms

boxes = x[:, :4] + c # boxes (offset by class) i = torchvision.ops.nms(boxes, scores, iou_thres) # NMS

只选出前 max_det的输出结果，避免有多余的输出

i = i[:max_det] # limit detections

现在 nms 逻辑

将prediction 中所有批次的结果进行统一，处理成一个批次，在将这个批次送入到 batched_nms 中，最后在进行处理，得到标注框，类别，置信度。

筛选出为true的索引（批次数）和行数

true_indices = torch.nonzero(xc)

根据索引和行数筛选出 prediction 中真实的结果，注意：这个结果是所有的批次的结果

selected_rows = prediction[true_indices[:, 0], true_indices[:, 1]]

将批次数添加在筛选出的结果中，用于区分出那个结果是那个批次的。注意：这个的批次也可以看成是对应的图片

new_prediction = torch.cat((selected_rows, true_indices[:, 0].unsqueeze(1).float()), dim=1)

分割出标注框、类别、掩码、索引（批次）

box, cls, mask, idxs = new_prediction.split((4, nc, nm, 1), 1)

筛选出最大的类别的置信和类别索引

conf, j = cls.max(1, keepdim=True)

根据类别置信度再次进行筛选，选出符合的结果，并进行拼接，得到一个新的结果

x = torch.cat((box, conf, j.float()), 1)[conf.squeeze(-1) > conf_thres]

将标注框，置信度，索引，iou值送入到 batched_nms 中，选出最终的预测结果的索引标签。

cls = x[:, 5] # classes c = x[:, 5:6] * (0 if agnostic else max_wh) boxes, scores = x[:, :4] + c, x[:, 4] # boxes (offset by class), scores idxs = idxs.t().squeeze(0) keep = torchvision.ops.batched_nms(boxes, scores, idxs, iou_thres)

batched_nms：以批处理方式执行非最大值抑制。每个索引值对应一个类别，NMS不会应用于不同类别的元素之间。

参数：
boxes （Tensor[N, 4]）：标注框，为(x1, y1, x2, y2)格式，其中0 <= x1 < x2和0 <= y1 < y2。
scores （Tensor[N]）：每个标注框的得分
idxs （Tensor[N]）：每个标注框的类别索引。
iou_threshold（float）：剔除 IoU > iou_threshold 的所有重叠方框
返回值：
Tensor：int64，为 NMS 保留的元素索引，按分数递减排序

def batched_nms(boxes: Tensor, scores: Tensor, idxs: Tensor, iou_threshold: float,) -> Tensor:

根据 nms 的筛选结果，选择出最终的预测结果

boxes[keep] = self.scale_boxes(inferShape, boxes[keep], orgShape) boxes = boxes[keep].cpu().numpy().tolist() scores = scores[keep].cpu().numpy().tolist() cls = cls[keep].cpu().numpy().tolist() idxs = idxs[keep].cpu().numpy().tolist()

修改后的代码

def non_max_suppression(self, prediction, inferShape, orgShape, conf_thres=0.25, iou_thres=0.45, agnostic=True, multi_label=False, max_wh=7680, nc=0): prediction = prediction[0] # select only inference output nc = nc # number of classes nm = prediction.shape[1] - nc - 4 mi = 4 + nc # mask start index xc = prediction[:, 4:mi].amax(1) > conf_thres # candidates # Settings multi_label &= nc > 1 # multiple labels per box (adds 0.5ms/img) prediction = prediction.transpose(-1, -2) # shape(1,84,6300) to shape(1,6300,84) prediction[..., :4] = self.xywh2xyxy(prediction[..., :4]) # xywh to xyxy true_indices = torch.nonzero(xc) selected_rows = prediction[true_indices[:, 0], true_indices[:, 1]] new_prediction = torch.cat((selected_rows, true_indices[:, 0].unsqueeze(1).float()), dim=1) if new_prediction.shape[0] == 0: return box, cls, mask, idxs = new_prediction.split((4, nc, nm, 1), 1) conf, j = cls.max(1, keepdim=True) x = torch.cat((box, conf, j.float()), 1)[conf.squeeze(-1) > conf_thres] if not x.shape[0]: # no boxes return cls = x[:, 5] # classes c = x[:, 5:6] * (0 if agnostic else max_wh) boxes, scores = x[:, :4] + c, x[:, 4] # boxes (offset by class), scores idxs = idxs.t().squeeze(0) keep = torchvision.ops.batched_nms(boxes, scores, idxs, iou_thres) boxes[keep] = self.scale_boxes(inferShape, boxes[keep], orgShape) boxes = boxes[keep].cpu().numpy() scores = scores[keep].cpu().numpy() cls = cls[keep].cpu().numpy() idxs = idxs[keep].cpu().numpy() results = np.hstack((boxes, np.expand_dims(scores, axis=1))) results = np.hstack((results, np.expand_dims(cls, axis=1))) results = np.hstack((results, np.expand_dims(idxs, axis=1))) return results

三、完整代码

通过上面的解析，我们了解了 YOLO V8-Detection 如何进行批量的推理图片的方法，并对每一步进行了实现。

完整的推理代码如下：

# -*- coding:utf-8 -*- # @author: 牧锦程 # @微信公众号: AI算法与电子竞赛 # @Email: m21z50c71@163.com # @VX：fylaicai import os.path import random import cv2 import numpy as np import torch import torchvision from ultralytics.nn.autobackend import AutoBackend import torch.nn.functional as F class YOLOV8DetectionInfer: def __init__(self, weights, cuda, conf_thres, iou_thres) -> None: self.imgsz = 640 self.device = cuda self.model = AutoBackend(weights, device=torch.device(cuda)) self.model.eval() self.names = self.model.names self.conf = conf_thres self.iou = iou_thres self.color = {"font": (255, 255, 255)} self.color.update( {self.names[i]: (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255)) for i in range(len(self.names))}) def infer(self, img_path, save_path): img_src = cv2.imread(img_path) img_array = np.array([img_src]) img = self.tensor_process(img_array) preds = self.model(img) results = self.non_max_suppression(preds, img.shape[2:], img_src.shape, self.conf, self.iou, nc=len(self.names)) for result in results: self.draw_box(img_array[int(result[6])], result[:4], result[4], self.names[result[5]]) for i in range(img_array.shape[0]): cv2.imwrite(os.path.join(save_path, f"{i}.jpg"), img_array[i]) def draw_box(self, img_src, box, conf, cls_name): lw = max(round(sum(img_src.shape) / 2 * 0.003), 2) # line width tf = max(lw - 1, 1) # font thickness sf = lw / 3 # font scale color = self.color[cls_name] label = f'{cls_name} {conf:.4f}' p1, p2 = (int(box[0]), int(box[1])), (int(box[2]), int(box[3])) # 绘制矩形框 cv2.rectangle(img_src, p1, p2, color, thickness=lw, lineType=cv2.LINE_AA) # text width, height w, h = cv2.getTextSize(label, 0, fontScale=sf, thickness=tf)[0] # label fits outside box outside = box[1] - h - 3 >= 0 p2 = p1[0] + w, p1[1] - h - 3 if outside else p1[1] + h + 3 # 绘制矩形框填充 cv2.rectangle(img_src, p1, p2, color, -1, cv2.LINE_AA) # 绘制标签 cv2.putText(img_src, label, (p1[0], p1[1] - 2 if outside else p1[1] + h + 2), 0, sf, self.color["font"], thickness=2, lineType=cv2.LINE_AA) def tensor_process(self, image_cv): img_shape = image_cv.shape[1:] new_shape = [640, 640] r = min(new_shape[0] / img_shape[0], new_shape[1] / img_shape[1]) # Compute padding new_unpad = int(round(img_shape[0] * r)), int(round(img_shape[1] * r)) dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # wh padding dw, dh = np.mod(dw, 32), np.mod(dh, 32) # wh padding dw /= 2 # divide padding into 2 sides dh /= 2 top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1)) left, right = int(round(dw - 0.1)), int(round(dw + 0.1)) padding_value = 114 # Convert image_cv = image_cv[..., ::-1].transpose((0, 3, 1, 2)) # BGR to RGB, BHWC to BCHW, (n, 3, h, w) image_cv = np.ascontiguousarray(image_cv) # contiguous image_tensor = torch.from_numpy(image_cv).float() image_tensor = image_tensor.to(self.device) resized_tensor = F.interpolate(image_tensor, size=new_unpad, mode='bilinear', align_corners=False) padded_tensor = F.pad(resized_tensor, (top, bottom, left, right), mode='constant', value=padding_value) infer_tensor = padded_tensor / 255.0 return infer_tensor def non_max_suppression(self, prediction, inferShape, orgShape, conf_thres=0.25, iou_thres=0.45, agnostic=True, multi_label=False, max_wh=7680, nc=0): prediction = prediction[0] # select only inference output nc = nc # number of classes nm = prediction.shape[1] - nc - 4 mi = 4 + nc # mask start index xc = prediction[:, 4:mi].amax(1) > conf_thres # candidates # Settings multi_label &= nc > 1 # multiple labels per box (adds 0.5ms/img) prediction = prediction.transpose(-1, -2) # shape(1,84,6300) to shape(1,6300,84) prediction[..., :4] = self.xywh2xyxy(prediction[..., :4]) # xywh to xyxy true_indices = torch.nonzero(xc) selected_rows = prediction[true_indices[:, 0], true_indices[:, 1]] new_prediction = torch.cat((selected_rows, true_indices[:, 0].unsqueeze(1).float()), dim=1) if new_prediction.shape[0] == 0: return box, cls, mask, idxs = new_prediction.split((4, nc, nm, 1), 1) conf, j = cls.max(1, keepdim=True) x = torch.cat((box, conf, j.float()), 1)[conf.squeeze(-1) > conf_thres] if not x.shape[0]: # no boxes return cls = x[:, 5] # classes c = x[:, 5:6] * (0 if agnostic else max_wh) boxes, scores = x[:, :4] + c, x[:, 4] # boxes (offset by class), scores idxs = idxs.t().squeeze(0) keep = torchvision.ops.batched_nms(boxes, scores, idxs, iou_thres) boxes[keep] = self.scale_boxes(inferShape, boxes[keep], orgShape) boxes = boxes[keep].cpu().numpy() scores = scores[keep].cpu().numpy() cls = cls[keep].cpu().numpy() idxs = idxs[keep].cpu().numpy() results = np.hstack((boxes, np.expand_dims(scores, axis=1))) results = np.hstack((results, np.expand_dims(cls, axis=1))) results = np.hstack((results, np.expand_dims(idxs, axis=1))) return results def xywh2xyxy(self, x): assert x.shape[-1] == 4, f"input shape last dimension expected 4 but input shape is {x.shape}" y = torch.empty_like(x) if isinstance(x, torch.Tensor) else np.empty_like(x) # faster than clone/copy dw = x[..., 2] / 2 # half-width dh = x[..., 3] / 2 # half-height y[..., 0] = x[..., 0] - dw # top left x y[..., 1] = x[..., 1] - dh # top left y y[..., 2] = x[..., 0] + dw # bottom right x y[..., 3] = x[..., 1] + dh # bottom right y return y def clip_boxes(self, boxes, shape): if isinstance(boxes, torch.Tensor): # faster individually (WARNING: inplace .clamp_() Apple MPS bug) boxes[..., 0] = boxes[..., 0].clamp(0, shape[1]) # x1 boxes[..., 1] = boxes[..., 1].clamp(0, shape[0]) # y1 boxes[..., 2] = boxes[..., 2].clamp(0, shape[1]) # x2 boxes[..., 3] = boxes[..., 3].clamp(0, shape[0]) # y2 else: # np.array (faster grouped) boxes[..., [0, 2]] = boxes[..., [0, 2]].clip(0, shape[1]) # x1, x2 boxes[..., [1, 3]] = boxes[..., [1, 3]].clip(0, shape[0]) # y1, y2 return boxes def scale_boxes(self, img1_shape, boxes, img0_shape, ratio_pad=None, padding=True, xywh=False): if ratio_pad is None: # calculate from img0_shape gain = min(img1_shape[0] / img0_shape[0], img1_shape[1] / img0_shape[1]) # gain = old / new pad = ( round((img1_shape[1] - img0_shape[1] * gain) / 2 - 0.1), round((img1_shape[0] - img0_shape[0] * gain) / 2 - 0.1), ) # wh padding else: gain = ratio_pad[0][0] pad = ratio_pad[1] if padding: boxes[..., 0] -= pad[0] # x padding boxes[..., 1] -= pad[1] # y padding if not xywh: boxes[..., 2] -= pad[0] # x padding boxes[..., 3] -= pad[1] # y padding boxes[..., :4] /= gain return self.clip_boxes(boxes, img0_shape) if __name__ == '__main__': weights = r'yolov8n.pt' cuda = 'cuda:0' save_path = "./runs" if not os.path.exists(save_path): os.mkdir(save_path) model = YOLOV8DetectionInfer(weights, cuda, 0.25, 0.45) img_path = r'./ultralytics/assets/bus.jpg' model.infer(img_path, save_path)