Query-based methods have garnered significant attention in object detection since the advent of DETR, the pioneering query-based detector. However, these methods face challenges like slow convergence and suboptimal performance. Notably, self-attention in object detection often hampers convergence due to its global focus. To address these issues, we propose FoLR, a transformer-like architecture with only decoders. We improve the self-attention by isolating connections between irrelevant objects that makes it focus on local regions but not global regions. We also design the adaptive sampling method to extract effective features based on queries' local regions from feature maps. Additionally, we employ a look-back strategy for decoders to retain previous information, followed by the Feature Mixer module to fuse features and queries. Experimental results demonstrate FoLR's state-of-the-art performance in query-based detectors, excelling in convergence speed and computational efficiency. Index Terms: Local regions, Attention mechanism, Object detection
翻译:自DETR这一开创性的查询式检测器问世以来,基于查询的方法在目标检测领域引起了广泛关注。然而,这些方法面临着收敛缓慢和性能欠佳等挑战。值得注意的是,目标检测中的自注意力机制由于其全局聚焦特性,往往会阻碍收敛。为解决这些问题,我们提出FoLR——一种仅包含解码器的类Transformer架构。我们通过隔离不相关对象之间的连接来改进自注意力机制,使其聚焦于局部区域而非全局区域。同时,我们设计了自适应采样方法,从特征图中提取基于查询局部区域的有效特征。此外,我们对解码器采用回溯策略以保留先前信息,随后通过特征混合器模块融合特征与查询。实验结果表明,FoLR在查询式检测器中达到了最先进的性能,在收敛速度和计算效率方面表现优异。关键词:局部区域,注意力机制,目标检测