

YOLO作为一种基于图像全局信息进行预测的目标检测系统,始终保持着极高的迭代更新率,从YOLOv5到YOLOv8,本次升级主要包括结构算法、命令行界面、Python API等。具体到YOLOv8,它可以在大型数据集上进行训练,并且能够在各种硬件平台上运行;YOLOv8还有一个关键特性是它的可扩展性,由于其被设计成一个框架,支持所有以前YOLO的版本,使得在不同版本之间切换和比较它们的性能变得容易。





├── project # X3 工作目录 │ ├── calib_f32 # 量化校准数据集 │ ├── coco128 # 量化校准和待检测图片 │ ├── config.yaml # onnx 转 bin 模型配置 │ ├── modules.py -> ../ultralytics/ultralytics/nn/modules.py # 软链接 YOLOv8 后处理文件 │ ├── onnxruntime-infer.py # pc 端读取 onnx 并检测 │ ├── requirements.txt # python 依赖包 │ ├── step1_export_onnx.py # YOLOv8 ONNX 导出 │ ├── step2_make_calib.py # 制作量化校准数据集 │ ├── step3_convert_bin.sh # onnx 转 bin 脚本 │ ├── step4_inference.py # X3 推理代码 │ ├── yolo-comparison-plots.png # YOLO 模型对比图 │ ├── yolov8n.onnx # 转换好的 onnx │ ├── yolov8n.pt # YOLOv8 pytorch 权重 │ └── yolov8n_horizon.bin # 转换好的 bin 模型 ├── ultralytics # YOLOv8 仓库 │ ├── CITATION.cff │ ├── CONTRIBUTING.md │ ├── LICENSE │ ├── MANIFEST.in │ ├── README.md │ ├── README.zh-CN.md │ ├── docker │ ├── docs │ ├── examples │ ├── mkdocs.yml │ ├── requirements.txt │ ├── setup.cfg │ ├── setup.py │ ├── tests │ └── ultralytics

YOLOv8 PyTorch环境配置

请在开发机中导出ONNX模型,安装PyTorch ONNX等依赖,再安装YOLOv8:

cd project python3 -m pip install -r requirements.txt cd ../ultralytics python3 setup.py install cd ../project



将YOLOv8中ultralytics/ultralytics/nn/modules.py软链接到 project/modules.py,方便定位到修改的代码位置,其中中有两个trick:

# *************************************************************************************************************** # # *************************************************************************************************************** # # 原仓库的版本带后处理 注释掉!!!! # def forward(self, x): # shape = x[0].shape # BCHW # for i in range(self.nl): # x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1) # if self.training: # return x # elif self.dynamic or self.shape != shape: # self.anchors, self.strides = (x.transpose(0, 1) for x in make_anchors(x, self.stride, 0.5)) # self.shape = shape # # box, cls = torch.cat([xi.view(shape[0], self.no, -1) for xi in x], 2).split((self.reg_max * 4, self.nc), 1) # dbox = dist2bbox(self.dfl(box), self.anchors.unsqueeze(0), xywh=True, dim=1) * self.strides # y = torch.cat((dbox, cls.sigmoid()), 1) # return y if self.export else (y, x) # *************************************************************************************************************** # # *************************************************************************************************************** # # X3 部署使用的版本!!!! def forward(self, x): res = [] for i in range(self.nl): bboxes = self.cv2[i](x[i]).permute(0, 2, 3, 1) scores = self.cv3[i](x[i]).permute(0, 2, 3, 1) res.append(bboxes) res.append(scores) # 返回 tuple 不会导出报错 return tuple(res) # *************************************************************************************************************** # # *************************************************************************************************************** #

  • 导出Transpose(permute)节点

bboxes = self.cv2[i](x[i]).permute(0, 2, 3, 1) scores = self.cv3[i](x[i]).permute(0, 2, 3, 1)

由于旭日X3派支持的模型格式为NHWC,但是PyTorch训练的模型是NCHW,因此我们导出的ONNX模型在转换bin时会在网络头和尾插入Transpose结点,而这个 Transpose节点的顺序是[0, 3, 1, 2],可以发现与我们插入的[0, 2, 3, 1]节点正好可以抵消,相当与少了个Transpose节点,这样是可以提升模型推理速度,避免不必要的计算的。

  • 将输出处理成 tuple



执行 step1_export_onnx.py,可以下载官方的权重并导出 ONNX。

# 导入 YOLOv8 from ultralytics import YOLO # 载入预训练权重 model = YOLO("yolov8n.pt") # 指定 opset=11 并且使用 onnx-sim 简化 ONNX success = model.export(format="onnx", opset=11, simplify=True)

python3 step1_export_onnx.py

注意:旭日X3派支持ONNX opset = 10/11,其他版本会无法通过模型工具链编译。



def letterbox(im, new_shape=(640, 640), color=114): # Resize and pad image while meeting stride-multiple constraints shape = im.shape[:2]


 # current shape [height, width] if isinstance(new_shape, int): new_shape = (new_shape, new_shape) # Scale ratio (new / old) r = min(new_shape[0] / shape[0], new_shape[1] / shape[1]) # Compute padding new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r)) dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] 


# wh padding dw /= 2 # divide padding into 2 sides dh /= 2 if shape[::-1] != new_unpad: # resize im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR) top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1)) left, right = int(round(dw - 0.1)), int(round(dw + 0.1)) im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=(color, color, color)) # add border return im, 1 / r, (dw, dh) def ratioresize(im, new_shape=(640, 640), color=114): shape = im.shape[:2] 


# current shape [height, width] if isinstance(new_shape, int): new_shape = (new_shape, new_shape) new_h, new_w = new_shape padded_img = np.ones((new_h, new_w, 3), dtype=np.uint8) * color # Scale ratio (new / old) r = min(new_h / shape[0], new_w / shape[1])


# Compute padding new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r)) if shape[::-1] != new_unpad: im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR) padded_img[: new_unpad[1], : new_unpad[0]] = im padded_img = np.ascontiguousarray(padded_img) return padded_img, 1 / r, (0, 0)

本文使用的两种图像缩放方法,letterbox是YOLOv8中训练时启用的方法,由于需要四周padding并且后处理要根据padding的数值还原,较为麻烦。使用 ratioresize方法,在保持图像的长宽比例的同时,使用右下角padding避免了后处理计算偏移量。

if __name__ == '__main__': images_path = Path('./coco128') model_path = Path('./yolov8n.onnx') score_thres = 0.4 iou_thres = 0.65 num_classes = 80 try: session = onnxruntime.InferenceSession(str(model_path), providers=['CPUExecutionProvider']) model_h, model_w = session.get_inputs()[0].shape[2:] except Exception as e: print(f'Load model error.\n{e}') exit() else: try: # 预热10次推理 for _ in range(10): session.run(None, {'images': np.random.randn(1, 3, model_h, model_w).astype(np.float32)}) except Exception as e: print(f'Warm up model error.\n{e}') cv2.namedWindow("results", cv2.WINDOW_AUTOSIZE) for img_path in images_path.iterdir(): image = cv2.imread(str(img_path)) t0 = time.perf_counter() 


## yolov8 training letterbox # resized, ratio, (dw, dh) = letterbox(image, (model_h, model_w)) resized, ratio, (dw, dh) = ratioresize(image, (model_h, model_w)) buffer = blob(resized) t1 = time.perf_counter() outputs = session.run(None, {'images': buffer}) outputs = [o[0] for o in outputs] t2 = time.perf_counter() results = postprocess( outputs, score_thres, iou_thres, image.shape[0], image.shape[1], dh, dw, ratio, ratio, 16, num_classes) results = nms(*results) t3 = time.perf_counter() for (x0, y0, x1, y1, score, label) in results: x0, y0, x1, y1 = map(int, [x0, y0, x1, y1]) cls_id = int(label) cls = CLASSES[cls_id] color = COLORS[cls] cv2.rectangle(image, [x0, y0], [x1, y1], color, 1) cv2.putText(image, f'{cls}:{score:.3f}', (x0, y0 - 2), cv2.FONT_HERSHEY_SIMPLEX, 0.325, [0, 0, 225], thickness=1) t4 = time.perf_counter() cv2.imshow('results', image)





img = cv2.imread(str(i)) img = letterbox(img)[0] img = blob(img[:, :, ::-1]) # bgr -> rgb print(img.shape) img.astype(np.float32).tofile(str(save / (i.stem + '.rgbchw')))

制作校准数据集主要是读图-> resize -> uint8转float -> numpy.tofile。在calib_f32目录下会生成50个rgbchw结尾的文件:

python3 step2_make_calib.py



cd ../ wget -c ftp://vrftp.horizon.ai/Open_Explorer_gcc_9.3.0/2.4.2/docker_openexplorer_centos_7_xj3_v2.4.2.tar.gz docker load -i docker_openexplorer_centos_7_xj3_v2.4.2.tar.gz docker run -it --name horizonX3 -v ${PWD}/project:/open_explorer/project openexplorer/ai_toolchain_centos_7_xj3:v2.4.2 docker exec -it horizonX3 /bin/bash


cd project bash step3_convert_bin.sh


/model.22/cv3.2/cv3.2.2/Conv BPU id(0) HzSQuantizedConv 0.998216 67.505043 2023-01-31 21:17:24,261 INFO [Tue Jan 31 21:17:24 2023] End to Horizon NN Model Convert. 2023-01-31 21:17:24,315 INFO start convert to *.bin file.... 2023-01-31 21:17:24,345 INFO ONNX model output num : 6 2023-01-31 21:17:24,346 INFO 


############# model deps info ############# 2023-01-31 21:17:24,346 INFO hb_mapper version : 1.9.9 2023-01-31 21:17:24,346 INFO hbdk version : 3.37.2 2023-01-31 21:17:24,346 INFO hbdk runtime version: 3.14.14 2023-01-31 21:17:24,346 INFO horizon_nn version : 0.14.0 2023-01-31 21:17:24,346 INFO 


############# model_parameters info ############# 2023-01-31 21:17:24,346 INFO onnx_model : /open_explorer/workspace/yolov8/yolov8n.onnx 2023-01-31 21:17:24,346 INFO BPU march : bernoulli2 2023-01-31 21:17:24,346 INFO layer_out_dump : False 2023-01-31 21:17:24,346 INFO log_level : DEBUG 2023-01-31 21:17:24,346 INFO working dir : /open_explorer/workspace/yolov8/model_output 2023-01-31 21:17:24,346 INFO output_model_file_prefix: yolov8n_horizon 2023-01-31 21:17:24,347 INFO


 ############# input_parameters info ############# 2023-01-31 21:17:24,347 INFO ------------------------------------------ 


2023-01-31 21:17:24,347 INFO ---------input info : images --------- 2023-01-31 21:17:24,347 INFO input_name : images 2023-01-31 21:17:24,347 INFO input_type_rt : nv12 2023-01-31 21:17:24,347 INFO input_space&range : regular 2023-01-31 21:17:24,347 INFO input_layout_rt : NHWC 2023-01-31 21:17:24,347 INFO input_type_train : rgb 2023-01-31 21:17:24,347 INFO input_layout_train : NCHW 2023-01-31 21:17:24,347 INFO norm_type : data_scale 2023-01-31 21:17:24,347 INFO input_shape : 1x3x640x640 2023-01-31 21:17:24,347 INFO input_batch : 1 2023-01-31 21:17:24,347 INFO scale_value : 0.003921568627451, 2023-01-31 21:17:24,347 INFO cal_data_dir : /open_explorer/calib_f32 2023-01-31 21:17:24,347 INFO ---------input info : images end ------- 2023-01-31 21:17:24,347 INFO ------------------------------------------


 2023-01-31 21:17:24,347 INFO ############# calibration_parameters info ############# 


2023-01-31 21:17:24,348 INFO preprocess_on : False 2023-01-31 21:17:24,348 INFO calibration_type: : max 2023-01-31 21:17:24,348 INFO cal_data_type : float32 2023-01-31 21:17:24,348 INFO max_percentile : 0.99999 2023-01-31 21:17:24,348 INFO per_channel : True 2023-01-31 21:17:24,348 INFO ############# compiler_parameters info ############# 2023-01-31 21:17:24,348 INFO hbdk_pass_through_params: --core-num 2 --fast --O3 2023-01-31 21:17:24,348 INFO input-source : {'images': 'pyramid', '_default_value': 'ddr'} 2023-01-31 21:17:24,354 INFO Convert to runtime bin file sucessfully! 2023-01-31 21:17:24,354 INFO End Model Convert /model.22/cv3.2/cv3.2.2/Conv BPU id(0) HzSQuantizedConv 0.998216 67.505043




将project文件夹打包上传到旭日X3派中,可以使用ssh或者U盘复制到旭日X3派工作目录中。假设保存到入/home/sunrise/project,推理前处理需要将输入转换到 nv12:

def bgr2nv12_opencv(image): height, width = image.shape[:2] area = height * width yuv420p = cv2.cvtColor(image, cv2.COLOR_BGR2YUV_I420).reshape((area * 3 // 2,)) y = yuv420p[:area] uv_planar = yuv420p[area:].reshape((2, area // 4)) uv_packed = uv_planar.transpose((1, 0)).reshape((area // 2,)) nv12 = np.zeros_like(yuv420p) nv12[:area] = y nv12[area:] = uv_packed return nv12


cd /home/sunrise/project sudo python3 -m pip install opencv-python # 安装 X3 推理依赖 mv model_output/yolov8n_horizon.bin ./ sudo python3 step4_inference.py





