Over the past years, YOLOs have emerged as the predominant paradigm in the field of real-time object detection owing to their effective balance between computational cost and detection performance. Researchers have explored the architectural designs, optimization objectives, data augmentation strategies, and others for YOLOs, achieving notable progress. However, the reliance on the non-maximum suppression (NMS) for post-processing hampers the end-to-end deployment of YOLOs and adversely impacts the inference latency. Besides, the design of various components in YOLOs lacks the comprehensive and thorough inspection, resulting in noticeable computational redundancy and limiting the model's capability. It renders the suboptimal efficiency, along with considerable potential for performance improvements. In this work, we aim to further advance the performance-efficiency boundary of YOLOs from both the post-processing and model architecture. To this end, we first present the consistent dual assignments for NMS-free training of YOLOs, which brings competitive performance and low inference latency simultaneously. Moreover, we introduce the holistic efficiency-accuracy driven model design strategy for YOLOs. We comprehensively optimize various components of YOLOs from both efficiency and accuracy perspectives, which greatly reduces the computational overhead and enhances the capability. The outcome of our effort is a new generation of YOLO series for real-time end-to-end object detection, dubbed YOLOv10. Extensive experiments show that YOLOv10 achieves state-of-the-art performance and efficiency across various model scales. For example, our YOLOv10-S is 1.8$\times$ faster than RT-DETR-R18 under the similar AP on COCO, meanwhile enjoying 2.8$\times$ smaller number of parameters and FLOPs. Compared with YOLOv9-C, YOLOv10-B has 46\% less latency and 25\% fewer parameters for the same performance.
翻译:过去数年间,YOLO系列模型因其在计算成本与检测性能间的有效平衡,已成为实时目标检测领域的主导范式。研究者们针对YOLO的架构设计、优化目标、数据增强策略等方面进行了深入探索,取得了显著进展。然而,对非极大值抑制(NMS)后处理的依赖阻碍了YOLO的端到端部署,并对推理延迟产生不利影响。此外,YOLO各组件设计缺乏全面深入的审视,导致明显的计算冗余并限制了模型能力,使得现有方案效率欠佳且存在巨大的性能提升空间。本工作旨在从后处理与模型架构两方面共同推进YOLO性能与效率的边界。为此,我们首先提出适用于YOLO无NMS训练的一致性双重分配策略,在实现竞争性性能的同时保持低推理延迟。进一步,我们为YOLO引入全栈式效率-精度驱动的模型设计策略,从效率与精度两个维度系统优化YOLO的各个组件,显著降低计算开销并提升模型能力。我们的研究成果是新一代实时端到端目标检测模型系列——YOLOv10。大量实验表明,YOLOv10在不同模型规模下均实现了最优的性能与效率平衡。例如,在COCO数据集上达到相近平均精度时,我们的YOLOv10-S比RT-DETR-R18快1.8倍,同时参数量与计算量分别减少至1/2.8。相较于YOLOv9-C,在同等性能下YOLOv10-B的延迟降低46%,参数量减少25%。