Channel and spatial attentions have respectively brought significant improvements in extracting feature dependencies and spatial structure relations for various downstream vision tasks. While their combination is more beneficial for leveraging their individual strengths, the synergy between channel and spatial attentions has not been fully explored, lacking in fully harness the synergistic potential of multi-semantic information for feature guidance and mitigation of semantic disparities. Our study attempts to reveal the synergistic relationship between spatial and channel attention at multiple semantic levels, proposing a novel Spatial and Channel Synergistic Attention module (SCSA). Our SCSA consists of two parts: the Shareable Multi-Semantic Spatial Attention (SMSA) and the Progressive Channel-wise Self-Attention (PCSA). SMSA integrates multi-semantic information and utilizes a progressive compression strategy to inject discriminative spatial priors into PCSA's channel self-attention, effectively guiding channel recalibration. Additionally, the robust feature interactions based on the self-attention mechanism in PCSA further mitigate the disparities in multi-semantic information among different sub-features within SMSA. We conduct extensive experiments on seven benchmark datasets, including classification on ImageNet-1K, object detection on MSCOCO 2017, segmentation on ADE20K, and four other complex scene detection datasets. Our results demonstrate that our proposed SCSA not only surpasses the current state-of-the-art attention but also exhibits enhanced generalization capabilities across various task scenarios. The code and models are available at: https://github.com/HZAI-ZJNU/SCSA.
翻译:通道注意力与空间注意力分别在提取特征依赖性与空间结构关系方面为各类下游视觉任务带来了显著提升。尽管二者的结合更有利于发挥各自优势,但通道注意力与空间注意力之间的协同作用尚未得到充分探索,未能充分利用多语义信息的协同潜力进行特征引导与语义差异缓解。本研究试图揭示空间与通道注意力在多个语义层次上的协同关系,提出了一种新颖的空间与通道协同注意力模块(SCSA)。我们的SCSA由两部分组成:可共享多语义空间注意力(SMSA)与渐进式通道自注意力(PCSA)。SMSA整合多语义信息,并采用渐进压缩策略将判别性空间先验注入PCSA的通道自注意力中,有效引导通道重校准。此外,PCSA中基于自注意力机制的鲁棒特征交互进一步缓解了SMSA内不同子特征间多语义信息的差异。我们在七个基准数据集上进行了广泛实验,包括ImageNet-1K分类、MSCOCO 2017目标检测、ADE20K分割以及另外四个复杂场景检测数据集。实验结果表明,我们提出的SCSA不仅超越了当前最先进的注意力方法,还在多种任务场景中展现出更强的泛化能力。代码与模型已开源:https://github.com/HZAI-ZJNU/SCSA。