In this paper, we provide the observation that too few queries assigned as positive samples in DETR with one-to-one set matching leads to sparse supervisions on the encoder's output which considerably hurt the discriminative feature learning of the encoder and vice visa for attention learning in the decoder. To alleviate this, we present a novel collaborative hybrid assignments training scheme, namely Co-DETR, to learn more efficient and effective DETR-based detectors from versatile label assignment manners. This new training scheme can easily enhance the encoder's learning ability in end-to-end detectors by training the multiple parallel auxiliary heads supervised by one-to-many label assignments such as ATSS, FCOS, and Faster RCNN. In addition, we conduct extra customized positive queries by extracting the positive coordinates from these auxiliary heads to improve the training efficiency of positive samples in the decoder. In inference, these auxiliary heads are discarded and thus our method introduces no additional parameters and computational cost to the original detector while requiring no hand-crafted non-maximum suppression (NMS). We conduct extensive experiments to evaluate the effectiveness of the proposed approach on DETR variants, including DAB-DETR, Deformable-DETR, and DINO-Deformable-DETR. Specifically, we improve the basic Deformable-DETR by 5.8% in 12-epoch training and 3.2% in 36-epoch training. The state-of-the-art DINO-Deformable-DETR can still be improved from 49.4% to 51.2% on the MS COCO val. Surprisingly, incorporated with the large-scale backbone MixMIM-g with 1-Billion parameters, we achieve the 64.5% mAP on MS COCO test-dev, achieving superior performance with much fewer extra data sizes. Codes will be available at https://github.com/Sense-X/Co-DETR.
翻译:本文观察到,在采用一对一集合匹配的DETR中,仅有少量查询被分配为正样本,导致编码器输出上的监督信号稀疏,这严重损害了编码器的判别性特征学习能力,同时反过来也影响了解码器的注意力学习。为解决这一问题,我们提出了一种新颖的协作混合分配训练方案(Co-DETR),通过利用多样化的标签分配方式来学习更高效、更有效的基于DETR的检测器。该训练方案通过训练多个并行辅助头(这些辅助头由一对一多标签分配方法如ATSS、FCOS和Faster RCNN监督),能够轻松增强端到端检测器中编码器的学习能力。此外,我们从这些辅助头中提取正坐标,生成额外的定制化正样本查询,以提高解码器中正样本的训练效率。在推理阶段,这些辅助头被丢弃,因此我们的方法不会给原始检测器引入额外参数和计算开销,同时无需手工设计的非极大值抑制(NMS)。我们进行了大量实验,在DETR变体(包括DAB-DETR、Deformable-DETR和DINO-Deformable-DETR)上评估了所提方法的有效性。具体而言,我们将基础Deformable-DETR在12轮训练中提升了5.8%,在36轮训练中提升了3.2%。最先进的DINO-Deformable-DETR在MS COCO验证集上仍可从49.4%提升至51.2%。令人惊讶的是,结合具有10亿参数的大规模骨干网络MixMIM-g,我们在MS COCO测试集上实现了64.5%的mAP,以更少的额外数据量取得了优越性能。代码将发布在https://github.com/Sense-X/Co-DETR。