We present YOLOBench, a benchmark comprised of 550+ YOLO-based object detection models on 4 different datasets and 4 different embedded hardware platforms (x86 CPU, ARM CPU, Nvidia GPU, NPU). We collect accuracy and latency numbers for a variety of YOLO-based one-stage detectors at different model scales by performing a fair, controlled comparison of these detectors with a fixed training environment (code and training hyperparameters). Pareto-optimality analysis of the collected data reveals that, if modern detection heads and training techniques are incorporated into the learning process, multiple architectures of the YOLO series achieve a good accuracy-latency trade-off, including older models like YOLOv3 and YOLOv4. We also evaluate training-free accuracy estimators used in neural architecture search on YOLOBench and demonstrate that, while most state-of-the-art zero-cost accuracy estimators are outperformed by a simple baseline like MAC count, some of them can be effectively used to predict Pareto-optimal detection models. We showcase that by using a zero-cost proxy to identify a YOLO architecture competitive against a state-of-the-art YOLOv8 model on a Raspberry Pi 4 CPU. The code and data are available at https://github.com/Deeplite/deeplite-torch-zoo
翻译:我们提出YOLOBench,该基准包含550多个基于YOLO的目标检测模型,覆盖4个不同数据集及4类嵌入式硬件平台(x86 CPU、ARM CPU、Nvidia GPU、NPU)。通过固定训练环境(代码与超参数)对这些检测器进行公平可控的比较,我们收集了不同模型尺度下多种基于YOLO的单阶段检测器的准确率与延迟数据。对数据的帕累托最优分析表明,若将现代检测头与训练技术融入学习过程,YOLO系列中的多种架构(包括YOLOv3和YOLOv4等较旧模型)均能实现良好的准确率-延迟权衡。我们还在YOLOBench上评估了神经架构搜索中使用的免训练准确率估计器,结果表明:虽然MAC计数等简单基线优于大多数最先进的零成本准确率估计器,但部分估计器可有效用于预测帕累托最优检测模型。我们展示了一个案例:通过零成本代理在Raspberry Pi 4 CPU上识别出可与最先进YOLOv8模型相媲美的YOLO架构。代码与数据见https://github.com/Deeplite/deeplite-torch-zoo。