DETRs with Collaborative Hybrid Assignments Training

In this paper, we provide the observation that too few queries assigned as positive samples in DETR with one-to-one set matching leads to sparse supervisions on the encoder's output which considerably hurt the discriminative feature learning of the encoder and vice visa for attention learning in the decoder. To alleviate this, we present a novel collaborative hybrid assignments training scheme, namely Co-DETR, to learn more efficient and effective DETR-based detectors from versatile label assignment manners. This new training scheme can easily enhance the encoder's learning ability in end-to-end detectors by training the multiple parallel auxiliary heads supervised by one-to-many label assignments such as ATSS and Faster RCNN. In addition, we conduct extra customized positive queries by extracting the positive coordinates from these auxiliary heads to improve the training efficiency of positive samples in the decoder. In inference, these auxiliary heads are discarded and thus our method introduces no additional parameters and computational cost to the original detector while requiring no hand-crafted non-maximum suppression (NMS). We conduct extensive experiments to evaluate the effectiveness of the proposed approach on DETR variants, including DAB-DETR, Deformable-DETR, and DINO-Deformable-DETR. Specifically, we improve the basic Deformable-DETR by 5.8% AP in 12-epoch training and 3.2% AP in 36-epoch training. The state-of-the-art DINO-Deformable-DETR with Swin-L can still be improved from 58.5% to 59.5% AP on COCO val. Surprisingly, incorporated with ViT-L backbone, we achieve 65.6% AP on COCO test-dev, outperforming previous methods with much fewer model sizes. Codes will be available at https://github.com/Sense-X/Co-DETR.

翻译：在本文中，我们观察到DETR中采用一对一套匹配时，被分配为正样本的查询数量过少，导致编码器输出上的监督稀疏，这严重影响了编码器的判别性特征学习，反之亦然，对解码器的注意力学习也产生负面影响。为缓解这一问题，我们提出了一种新颖的协作混合分配训练方案——Co-DETR，通过多样化的标签分配方式学习更高效且有效的基于DETR的检测器。这种新训练方案通过训练多个并行辅助头（由一对多标签分配如ATSS和Faster RCNN监督），能够轻松增强端到端检测器中编码器的学习能力。此外，我们通过从这些辅助头中提取正坐标来额外定制正样本查询，以提高解码器中正样本的训练效率。在推理阶段，这些辅助头被丢弃，因此我们的方法不引入额外参数和计算成本，同时无需手工设计的非极大值抑制（NMS）。我们进行了大量实验来评估所提方法在DETR变体（包括DAB-DETR、Deformable-DETR和DINO-Deformable-DETR）上的有效性。具体而言，我们将基础Deformable-DETR在12轮训练中提升了5.8%的AP，在36轮训练中提升了3.2%的AP。采用Swin-L骨干的最先进DINO-Deformable-DETR在COCO val上仍可从58.5%提升至59.5%的AP。令人惊讶的是，结合ViT-L骨干，我们在COCO test-dev上实现了65.6%的AP，以更小的模型规模超越了先前方法。代码将在https://github.com/Sense-X/Co-DETR 提供。