Query-based methods have garnered significant attention in object detection since the advent of DETR, the pioneering end-to-end query-based detector. However, these methods face challenges like slow convergence and suboptimal performance. Notably, self-attention in object detection often hampers convergence due to its global focus. To address these issues, we propose FoLR, a transformer-like architecture with only decoders. We enhance the self-attention mechanism by isolating connections between irrelevant objects that makes it focus on local regions but not global regions. We also design the adaptive sampling method to extract effective features based on queries' local regions from feature maps. Additionally, we employ a look-back strategy for decoders to retain prior information, followed by the Feature Mixer module to fuse features and queries. Experimental results demonstrate FoLR's state-of-the-art performance in query-based detectors, excelling in convergence speed and computational efficiency.
翻译:自DETR(首个端到端查询式检测器)问世以来,查询式方法在目标检测领域获得了广泛关注。然而,这些方法面临收敛缓慢和性能欠佳等挑战。值得注意的是,目标检测中的自注意力机制常因全局聚焦而阻碍收敛。为解决这些问题,我们提出FoLR——一种仅含解码器的类Transformer架构。我们通过隔离不相关目标之间的连接来增强自注意力机制,使其聚焦于局部区域而非全局区域。同时,我们设计自适应采样方法,基于查询的局部区域从特征图中提取有效特征。此外,我们采用回溯策略使解码器保留先前信息,并借助特征混合器模块融合特征与查询。实验结果表明,FoLR在查询式检测器中达到了最先进水平,在收敛速度与计算效率方面表现优异。