One-to-one label assignment in object detection has successfully obviated the need for non-maximum suppression (NMS) as postprocessing and makes the pipeline end-to-end. However, it triggers a new dilemma as the widely used sparse queries cannot guarantee a high recall, while dense queries inevitably bring more similar queries and encounter optimization difficulties. As both sparse and dense queries are problematic, then what are the expected queries in end-to-end object detection? This paper shows that the solution should be Dense Distinct Queries (DDQ). Concretely, we first lay dense queries like traditional detectors and then select distinct ones for one-to-one assignments. DDQ blends the advantages of traditional and recent end-to-end detectors and significantly improves the performance of various detectors including FCN, R-CNN, and DETRs. Most impressively, DDQ-DETR achieves 52.1 AP on MS-COCO dataset within 12 epochs using a ResNet-50 backbone, outperforming all existing detectors in the same setting. DDQ also shares the benefit of end-to-end detectors in crowded scenes and achieves 93.8 AP on CrowdHuman. We hope DDQ can inspire researchers to consider the complementarity between traditional methods and end-to-end detectors. The source code can be found at \url{https://github.com/jshilong/DDQ}.
翻译:目标检测中的一对一标签分配成功消除了后处理中非极大值抑制(NMS)的必要性,并使流程实现端到端。然而,这引发了一个新的困境:广泛使用的稀疏查询无法保证高召回率,而密集查询不可避免地带来更多相似查询,并面临优化困难。既然稀疏和密集查询都存在缺陷,那么端到端目标检测中理想的查询应是什么?本文表明,解决方案应为密集区分查询(DDQ)。具体而言,我们首先像传统检测器一样部署密集查询,然后选择区分性查询用于一对一分配。DDQ融合了传统检测器与近期端到端检测器的优势,显著提升了包括FCN、R-CNN和DETR在内的多种检测器的性能。最引人注目的是,DDQ-DETR在MS-COCO数据集上使用ResNet-50骨干网络,在12个训练周期内达到了52.1 AP,超越了相同设置下的所有现有检测器。DDQ在拥挤场景中同样继承了端到端检测器的优势,在CrowdHuman数据集上达到了93.8 AP。我们希望DDQ能启发研究者关注传统方法与端到端检测器之间的互补性。源代码可在\url{https://github.com/jshilong/DDQ}获取。