In this report, we present a fast and accurate object detection method dubbed DAMO-YOLO, which achieves higher performance than the state-of-the-art YOLO series. DAMO-YOLO is extended from YOLO with some new technologies, including Neural Architecture Search (NAS), efficient Reparameterized Generalized-FPN (RepGFPN), a lightweight head with AlignedOTA label assignment, and distillation enhancement. In particular, we use MAE-NAS, a method guided by the principle of maximum entropy, to search our detection backbone under the constraints of low latency and high performance, producing ResNet/CSP-like structures with spatial pyramid pooling and focus modules. In the design of necks and heads, we follow the rule of ``large neck, small head''.We import Generalized-FPN with accelerated queen-fusion to build the detector neck and upgrade its CSPNet with efficient layer aggregation networks (ELAN) and reparameterization. Then we investigate how detector head size affects detection performance and find that a heavy neck with only one task projection layer would yield better results.In addition, AlignedOTA is proposed to solve the misalignment problem in label assignment. And a distillation schema is introduced to improve performance to a higher level. Based on these new techs, we build a suite of models at various scales to meet the needs of different scenarios. For general industry requirements, we propose DAMO-YOLO-T/S/M/L. They can achieve 43.6/47.7/50.2/51.9 mAPs on COCO with the latency of 2.78/3.83/5.62/7.95 ms on T4 GPUs respectively. Additionally, for edge devices with limited computing power, we have also proposed DAMO-YOLO-Ns/Nm/Nl lightweight models. They can achieve 32.3/38.2/40.5 mAPs on COCO with the latency of 4.08/5.05/6.69 ms on X86-CPU. Our proposed general and lightweight models have outperformed other YOLO series models in their respective application scenarios.
翻译:本文提出了一种快速准确的目标检测方法,命名为DAMO-YOLO,其性能超越了当前最先进的YOLO系列。DAMO-YOLO在YOLO基础上扩展了多项新技术,包括神经架构搜索(NAS)、高效重参数化广义特征金字塔网络(RepGFPN)、采用AlignedOTA标签分配的轻量级检测头,以及蒸馏增强。具体而言,我们使用基于最大熵原理指导的MAE-NAS方法,在低延迟和高性能约束下搜索检测骨干网络,生成了带有空间金字塔池化与聚焦模块的ResNet/CSP类结构。在颈部网络与检测头的设计中,我们遵循"大颈部、小头部"原则,引入采用加速女王融合机制的广义特征金字塔网络构建检测器颈部,并通过高效层聚合网络(ELAN)与重参数化技术对其CSPNet进行升级。随后我们探究了检测头尺寸对检测性能的影响,发现仅含一个任务投影层的重型颈部网络能获得更优结果。此外,本文提出AlignedOTA方法解决标签分配中的错位问题,并引入蒸馏机制将性能提升至更高水平。基于这些新技术,我们构建了覆盖多种尺度的模型系列以适配不同场景需求。针对通用工业需求,我们提出DAMO-YOLO-T/S/M/L模型,它们在COCO数据集上分别达到43.6/47.7/50.2/51.9 mAP,在T4 GPU上延迟为2.78/3.83/5.62/7.95毫秒。针对算力受限的边缘设备,我们进一步提出DAMO-YOLO-Ns/Nm/Nl轻量级模型,在X86-CPU上分别达到32.3/38.2/40.5 mAP,延迟为4.08/5.05/6.69毫秒。所提出的通用与轻量级模型在各自应用场景中均超越了其他YOLO系列模型。