One-to-one set matching is a key design for DETR to establish its end-to-end capability, so that object detection does not require a hand-crafted NMS (non-maximum suppression) to remove duplicate detections. This end-to-end signature is important for the versatility of DETR, and it has been generalized to broader vision tasks. However, we note that there are few queries assigned as positive samples and the one-to-one set matching significantly reduces the training efficacy of positive samples. We propose a simple yet effective method based on a hybrid matching scheme that combines the original one-to-one matching branch with an auxiliary one-to-many matching branch during training. Our hybrid strategy has been shown to significantly improve accuracy. In inference, only the original one-to-one match branch is used, thus maintaining the end-to-end merit and the same inference efficiency of DETR. The method is named H-DETR, and it shows that a wide range of representative DETR methods can be consistently improved across a wide range of visual tasks, including DeformableDETR, PETRv2, PETR, and TransTrack, among others. The code is available at: https://github.com/HDETR
翻译:一对一集合匹配是DETR实现端到端能力的关键设计,使得目标检测不再需要手工设计的NMS(非极大值抑制)来去除重复检测。这一端到端特性对于DETR的通用性至关重要,并已被推广到更广泛的视觉任务中。然而,我们注意到仅有少量查询被分配为正样本,而一对一集合匹配显著降低了正样本的训练效率。我们提出一种简单有效的方法,基于混合匹配策略,在训练阶段将原始的一对一匹配分支与辅助的一对多匹配分支相结合。实验表明,我们的混合策略显著提升了准确率。在推理时,仅使用原始的一对一匹配分支,从而保持了DETR的端到端优势和相同的推理效率。该方法命名为H-DETR,并表明包括DeformableDETR、PETRv2、PETR和TransTrack在内的一系列代表性DETR方法,在多种视觉任务上均能得到持续改进。代码已开源:https://github.com/HDETR