Object detection is an important topic in computer vision, with post-processing, an essential part of the typical object detection pipeline, posing a significant bottleneck affecting the performance of traditional object detection models. The detection transformer (DETR), as the first end-to-end target detection model, discards the requirement of manual components like the anchor and non-maximum suppression (NMS), significantly simplifying the target detection process. However, compared with most traditional object detection models, DETR converges very slowly, and a query's meaning is obscure. Thus, inspired by the Step-by-Step concept, this paper proposes a new two-stage object detection model, named DETR with YOLO (DEYO), which relies on a progressive inference to solve the above problems. DEYO is a two-stage architecture comprising a classic target detection model and a DETR-like model as the first and second stages, respectively. Specifically, the first stage provides high-quality query and anchor feeding into the second stage, improving the performance and efficiency of the second stage compared to the original DETR model. Meanwhile, the second stage compensates for the performance degradation caused by the first stage detector's limitations. Extensive experiments demonstrate that DEYO attains 50.6 AP and 52.1 AP in 12 and 36 epochs, respectively, while utilizing ResNet-50 as the backbone and multi-scale features on the COCO dataset. Compared with DINO, an optimal DETR-like model, the developed DEYO model affords a significant performance improvement of 1.6 AP and 1.2 AP in two epoch settings.
翻译:目标检测是计算机视觉中的重要课题,后处理作为典型目标检测流程中不可或缺的环节,构成了制约传统目标检测模型性能的主要瓶颈。检测变压器(DETR)作为首个端到端目标检测模型,摒弃了锚框与非极大值抑制(NMS)等手工设计组件,显著简化了目标检测流程。然而,与传统目标检测模型相比,DETR存在收敛速度极慢、查询含义不明确等问题。受"逐步推理"理念启发,本文提出一种新型两阶段目标检测模型——结合YOLO的DETR(DEYO),通过渐进式推理解决上述问题。DEYO采用两阶段架构:第一阶段为经典目标检测模型,第二阶段为类DETR模型。具体而言,第一阶段向第二阶段提供高质量的查询与锚框,较原始DETR模型显著提升第二阶段的性能与效率;同时,第二阶段弥补了第一阶段检测器局限性导致的性能退化。大量实验表明,在COCO数据集上使用ResNet-50骨干网络与多尺度特征时,DEYO在12个训练周期和36个训练周期下分别达到50.6 AP与52.1 AP。与最优类DETR模型DINO相比,本文所提DEYO模型在两个训练周期设置下分别实现1.6 AP与1.2 AP的显著性能提升。