One-to-one set matching is a key design for DETR to establish its end-to-end capability, so that object detection does not require a hand-crafted NMS (non-maximum suppression) to remove duplicate detections. This end-to-end signature is important for the versatility of DETR, and it has been generalized to broader vision tasks. However, we note that there are few queries assigned as positive samples and the one-to-one set matching significantly reduces the training efficacy of positive samples. We propose a simple yet effective method based on a hybrid matching scheme that combines the original one-to-one matching branch with an auxiliary one-to-many matching branch during training. Our hybrid strategy has been shown to significantly improve accuracy. In inference, only the original one-to-one match branch is used, thus maintaining the end-to-end merit and the same inference efficiency of DETR. The method is named H-DETR, and it shows that a wide range of representative DETR methods can be consistently improved across a wide range of visual tasks, including DeformableDETR, PETRv2, PETR, and TransTrack, among others. The code is available at: https://github.com/HDETR
翻译:一对一集合匹配是DETR实现端到端能力的关键设计,从而使目标检测无需手工设计的NMS(非极大值抑制)来去除重复检测。这种端到端特性对DETR的通用性至关重要,并已被推广至更广泛的视觉任务。然而,我们注意到只有少数查询被分配为正样本,而一对一集合匹配显著降低了正样本的训练效率。为此,我们提出一种基于混合匹配方案的简单有效方法,该方法在训练阶段将原始的一对一匹配分支与辅助的一对多匹配分支相结合。实验证明,我们的混合策略能够显著提升精度。在推理阶段,仅使用原始的一对一匹配分支,从而保持了DETR的端到端优势及相同的推理效率。该方法被命名为H-DETR,并表明包括DeformableDETR、PETRv2、PETR及TransTrack等在内的多种代表性DETR方法,在广泛的视觉任务上均能得到持续改进。代码已开源:https://github.com/HDETR