Query-based object detectors have made significant advancements since the publication of DETR. However, most existing methods still rely on multi-stage encoders and decoders, or a combination of both. Despite achieving high accuracy, the multi-stage paradigm (typically consisting of 6 stages) suffers from issues such as heavy computational burden, prompting us to reconsider its necessity. In this paper, we explore multiple techniques to enhance query-based detectors and, based on these findings, propose a novel model called GOLO (Global Once and Local Once), which follows a two-stage decoding paradigm. Compared to other mainstream query-based models with multi-stage decoders, our model employs fewer decoder stages while still achieving considerable performance. Experimental results on the COCO dataset demonstrate the effectiveness of our approach.
翻译:自DETR提出以来,基于查询的目标检测器取得了显著进展。然而,现有方法大多仍依赖多阶段编码器和解码器,或两者的组合。尽管多阶段范式(通常包含6个阶段)实现了高精度,但其存在计算负担沉重等问题,促使我们重新思考其必要性。本文探索了多种增强基于查询检测器的技术,并基于这些发现提出了一种名为GOLO(全局一次与局部一次)的新模型,该模型采用两阶段解码范式。与主流的多阶段解码器查询模型相比,我们的模型使用更少的解码阶段,同时仍能取得可观的性能表现。在COCO数据集上的实验结果验证了我们方法的有效性。