One-to-one label assignment in object detection has successfully obviated the need for non-maximum suppression (NMS) as postprocessing and makes the pipeline end-to-end. However, it triggers a new dilemma as the widely used sparse queries cannot guarantee a high recall, while dense queries inevitably bring more similar queries and encounter optimization difficulties. As both sparse and dense queries are problematic, then what are the expected queries in end-to-end object detection? This paper shows that the solution should be Dense Distinct Queries (DDQ). Concretely, we first lay dense queries like traditional detectors and then select distinct ones for one-to-one assignments. DDQ blends the advantages of traditional and recent end-to-end detectors and significantly improves the performance of various detectors including FCN, R-CNN, and DETRs. Most impressively, DDQ-DETR achieves 52.1 AP on MS-COCO dataset within 12 epochs using a ResNet-50 backbone, outperforming all existing detectors in the same setting. DDQ also shares the benefit of end-to-end detectors in crowded scenes and achieves 93.8 AP on CrowdHuman. We hope DDQ can inspire researchers to consider the complementarity between traditional methods and end-to-end detectors. The source code can be found at \url{https://github.com/jshilong/DDQ}.
翻译:目标检测中的一对一标签分配成功地避免了将非极大值抑制作为后处理步骤,并实现了流水线的端到端化。然而,这引发了新的困境:广泛使用的稀疏查询无法保证高召回率,而密集查询则不可避免地带来更多相似查询,并遇到优化困难。既然稀疏和密集查询均存在问题,那么端到端目标检测中理想的查询应是什么样的?本文表明,解决方案应为密集区分查询。具体而言,我们首先像传统检测器一样设置密集查询,然后选择其中具有区分性的查询进行一对一分配。DDQ融合了传统检测器与近期端到端检测器的优势,显著提升了包括FCN、R-CNN和DETR在内的多种检测器的性能。最令人瞩目的是,DDQ-DETR在ResNet-50主干网络下,12个训练周期内于MS-COCO数据集上达到了52.1 AP,超越了相同设置下的所有现有检测器。DDQ还继承了端到端检测器在拥挤场景中的优势,在CrowdHuman数据集上取得了93.8 AP。我们期望DDQ能启发研究者思考传统方法与端到端检测器之间的互补性。源代码见\url{https://github.com/jshilong/DDQ}。