We introduce DEIM, an innovative and efficient training framework designed to accelerate convergence in real-time object detection with Transformer-based architectures (DETR). To mitigate the sparse supervision inherent in one-to-one (O2O) matching in DETR models, DEIM employs a Dense O2O matching strategy. This approach increases the number of positive samples per image by incorporating additional targets, using standard data augmentation techniques. While Dense O2O matching speeds up convergence, it also introduces numerous low-quality matches that could affect performance. To address this, we propose the Matchability-Aware Loss (MAL), a novel loss function that optimizes matches across various quality levels, enhancing the effectiveness of Dense O2O. Extensive experiments on the COCO dataset validate the efficacy of DEIM. When integrated with RT-DETR and D-FINE, it consistently boosts performance while reducing training time by 50%. Notably, paired with RT-DETRv2, DEIM achieves 53.2% AP in a single day of training on an NVIDIA 4090 GPU. Additionally, DEIM-trained real-time models outperform leading real-time object detectors, with DEIM-D-FINE-L and DEIM-D-FINE-X achieving 54.7% and 56.5% AP at 124 and 78 FPS on an NVIDIA T4 GPU, respectively, without the need for additional data. We believe DEIM sets a new baseline for advancements in real-time object detection. Our code and pre-trained models are available at https://github.com/ShihuaHuang95/DEIM.
翻译:本文提出DEIM,一种创新且高效的训练框架,旨在加速基于Transformer架构(DETR)的实时目标检测模型的收敛。为缓解DETR模型中一对一(O2O)匹配固有的稀疏监督问题,DEIM采用密集O2O匹配策略。该方法通过标准数据增强技术引入额外目标,增加每张图像的正样本数量。虽然密集O2O匹配能加速收敛,但也会引入大量可能影响性能的低质量匹配。为此,我们提出可匹配性感知损失(MAL),这是一种新颖的损失函数,可优化不同质量等级的匹配,提升密集O2O匹配的有效性。在COCO数据集上的大量实验验证了DEIM的有效性。当与RT-DETR和D-FINE结合时,该框架在减少50%训练时间的同时持续提升性能。值得注意的是,配合RT-DETRv2使用,DEIM在NVIDIA 4090 GPU上单日训练即可达到53.2% AP。此外,经DEIM训练的实时模型性能优于主流实时目标检测器:DEIM-D-FINE-L和DEIM-D-FINE-X在NVIDIA T4 GPU上分别以124 FPS和78 FPS达到54.7%和56.5% AP,且无需额外数据。我们相信DEIM为实时目标检测的发展设立了新基准。代码与预训练模型已开源:https://github.com/ShihuaHuang95/DEIM。