Modern detection transformers (DETRs) use a set of object queries to predict a list of bounding boxes, sort them by their classification confidence scores, and select the top-ranked predictions as the final detection results for the given input image. A highly performant object detector requires accurate ranking for the bounding box predictions. For DETR-based detectors, the top-ranked bounding boxes suffer from less accurate localization quality due to the misalignment between classification scores and localization accuracy, thus impeding the construction of high-quality detectors. In this work, we introduce a simple and highly performant DETR-based object detector by proposing a series of rank-oriented designs, combinedly called Rank-DETR. Our key contributions include: (i) a rank-oriented architecture design that can prompt positive predictions and suppress the negative ones to ensure lower false positive rates, as well as (ii) a rank-oriented loss function and matching cost design that prioritizes predictions of more accurate localization accuracy during ranking to boost the AP under high IoU thresholds. We apply our method to improve the recent SOTA methods (e.g., H-DETR and DINO-DETR) and report strong COCO object detection results when using different backbones such as ResNet-$50$, Swin-T, and Swin-L, demonstrating the effectiveness of our approach. Code is available at \url{https://github.com/LeapLabTHU/Rank-DETR}.
翻译:现代检测变换器(DETRs)使用一组目标查询来预测边界框列表,根据分类置信度分数对其进行排序,并选择排名靠前的预测作为给定输入图像的最终检测结果。高性能目标检测器需要对边界框预测进行准确的排序。对于基于DETR的检测器,由于分类分数与定位准确性之间的不对齐,排名靠前的边界框往往定位质量较低,从而阻碍了高质量检测器的构建。在本工作中,我们提出了一系列面向排名的设计,统称为Rank-DETR,从而引入了一种简单且高性能的基于DETR的目标检测器。我们的关键贡献包括:(i)一种面向排名的架构设计,能够促进正预测并抑制负预测,以确保较低的误检率;(ii)一种面向排名的损失函数和匹配代价设计,在排序过程中优先考虑定位准确性更高的预测,从而提升高IoU阈值下的平均精度(AP)。我们将所提方法应用于改进近期最先进方法(如H-DETR和DINO-DETR),并报告了使用不同骨干网络(如ResNet-$50$、Swin-T和Swin-L)时在COCO数据集上的强劲目标检测结果,证明了我们方法的有效性。代码已开源在 \url{https://github.com/LeapLabTHU/Rank-DETR}。