YOLO26N 轻量化模型:移动端与嵌入式部署指南
YOLO26N 轻量化模型:移动端与嵌入式部署指南
1. YOLO26N 模型规格
YOLO26N 核心参数: ├── 参数量:2.6M ├── FLOPs:5.1G ├── mAP50-95:38.5(COCO) ├── 输入尺寸:640x640 ├── 模型文件:~5MB(FP16)/ ~3MB(INT8) └── 推理速度: ├── RTX 4090:1.2ms ├── Jetson Orin NX:4.5ms(FP16)/ 3.2ms(INT8) ├── RK3588 NPU:8ms(INT8) └── 手机 CPU:15ms2. 模型导出
#!/usr/bin/env python3"""export_yolo26n.py - YOLO26N 多格式导出"""fromultralyticsimportYOLO model=YOLO("yolo26n.pt")# ONNX(通用)model.export(format="onnx",imgsz=640,opset=11,simplify=True)# TensorRT(NVIDIA GPU)model.export(format="engine",imgsz=640,half=True,batch=1)# CoreML(iOS)model.export(format="coreml",imgsz=640,half=True)# TFLite(Android)model.export(format="tflite",imgsz=640,int8=True)# NCNN(移动端通用)model.export(format="ncnn",imgsz=640)# OpenVINO(Intel)model.export(format="openvino",imgsz=640,half=True)3. Android 部署(TFLite)
// YOLO26NDetector.ktclassYOLO26NDetector(context:Context){privatevarinterpreter:Interpreter?=nullprivatevalinputSize=640privatevalnumClasses=80init{valmodel=loadModelFile(context,"yolo26n_int8.tflite")valoptions=Interpreter.Options().apply{setNumThreads(4)addDelegate(GpuDelegate())// GPU 加速}interpreter=Interpreter(model,options)}fundetect(bitmap:Bitmap):List<Detection>{// 预处理valinput=preprocess(bitmap)// 推理valoutput=Array(1){FloatArray(84*8400)}interpreter?.run(input,output)// 后处理returnpostprocess(output[0],bitmap.width,bitmap.height)}privatefunpreprocess(bitmap:Bitmap):ByteBuffer{valbuffer=ByteBuffer.allocateDirect(4*inputSize*inputSize*3)buffer.order(ByteOrder.nativeOrder())valresized=Bitmap.createScaledBitmap(bitmap,inputSize,inputSize,true)valpixels=IntArray(inputSize*inputSize)resized.getPixels(pixels,0,inputSize,0,0,inputSize,inputSize)for(pixelinpixels){buffer.putFloat((pixelshr16and0xFF)/255f)buffer.putFloat((pixelshr8and0xFF)/255f)buffer.putFloat((pixeland0xFF)/255f)}returnbuffer}}4. iOS 部署(CoreML)
// YOLO26NDetector.swiftimportCoreMLimportVisionclassYOLO26NDetector{privatevarmodel:VNCoreMLModel?init(){guardletmodelURL=Bundle.main.url(forResource:"yolo26n",withExtension:"mlmodelc"),letmlModel=try?MLModel(contentsOf:modelURL),letvnModel=try?VNCoreMLModel(for:mlModel)else{return}self.model=vnModel}funcdetect(image:UIImage,completion:@escaping([Detection])->Void){guardletcgImage=image.cgImageelse{return}letrequest=VNCoreMLRequest(model:model!){request,erroringuardletresults=request.resultsas?[VNRecognizedObjectObservation]else{return}letdetections=results.map{obs->Detectioninletbbox=obs.boundingBoxletlabel=obs.labels.first!returnDetection(bbox:bbox,confidence:label.confidence,className:label.identifier)}completion(detections)}request.imageCropAndScaleOption=.scaleFilllethandler=VNImageRequestHandler(cgImage:cgImage)try?handler.perform([request])}}5. 嵌入式部署(NCNN)
// yolo26n_ncnn.cpp#include<ncnn/net.h>#include<ncnn/mat.h>classYOLO26NDetector{public:intload(constchar*param_path,constchar*bin_path){net.load_param(param_path);net.load_model(bin_path);return0;}std::vector<Detection>detect(constcv::Mat&image){// 预处理ncnn::Mat in=ncnn::Mat::from_pixels_resize(image.data,ncnn::Mat::PIXEL_BGR2RGB,image.cols,image.rows,640,640);constfloatnorm_vals[3]={1/255.f,1/255.f,1/255.f};in.substract_mean_normalize(0,norm_vals);// 推理ncnn::Extractor ex=net.create_extractor();ex.input("images",in);ncnn::Mat out;ex.extract("output0",out);// 后处理returnpostprocess(out,image.cols,image.rows);}private:ncnn::Net net;};6. 性能基准
YOLO26N 各平台性能: ┌──────────────────┬──────────┬──────────┬──────────┐ │ 平台 │ 精度 │ 延迟 │ FPS │ ├──────────────────┼──────────┼──────────┼──────────┤ │ RTX 4090 │ FP16 │ 1.2ms │ 833 │ │ Jetson Orin NX │ FP16 │ 4.5ms │ 222 │ │ Jetson Orin NX │ INT8 │ 3.2ms │ 312 │ │ Jetson Orin Nano │ FP16 │ 8ms │ 125 │ │ RK3588 NPU │ INT8 │ 8ms │ 125 │ │ RK3588 CPU │ FP32 │ 45ms │ 22 │ │ iPhone 15 Pro │ FP16 │ 5ms │ 200 │ │ Pixel 8 Pro │ FP16 │ 12ms │ 83 │ │ Raspberry Pi 5 │ FP32 │ 80ms │ 12 │ └──────────────────┴──────────┴──────────┴──────────┘总结
| 平台 | 推荐格式 | 预期 FPS |
|---|---|---|
| NVIDIA GPU | TensorRT FP16 | 200+ |
| Jetson | TensorRT INT8 | 300+ |
| Android | TFLite INT8 | 80+ |
| iOS | CoreML FP16 | 200+ |
| RK3588 | RKNN INT8 | 120+ |
| 嵌入式 | NCNN | 30+ |
核心要点:
- YOLO26N 是最轻量的变体:仅 2.6M 参数,5MB 模型
- INT8 量化后仅 3MB:适合 OTA 更新
- 多框架支持:TFLite/CoreML/NCNN/TensorRT 全覆盖
- 实时性能:大多数平台 30+ FPS
