RFAConv: Innovating Spatial Attention and Standard Convolutional Operation

Spatial attention has been widely used to improve the performance of convolutional neural networks. However, it has certain limitations. In this paper, we propose a new perspective on the effectiveness of spatial attention, which is that the spatial attention mechanism essentially solves the problem of convolutional kernel parameter sharing. However, the information contained in the attention map generated by spatial attention is not sufficient for large-size convolutional kernels. Therefore, we propose a novel attention mechanism called Receptive-Field Attention (RFA). Existing spatial attention, such as Convolutional Block Attention Module (CBAM) and Coordinated Attention (CA) focus only on spatial features, which does not fully address the problem of convolutional kernel parameter sharing. In contrast, RFA not only focuses on the receptive-field spatial feature but also provides effective attention weights for large-size convolutional kernels. The Receptive-Field Attention convolutional operation (RFAConv), developed by RFA, represents a new approach to replace the standard convolution operation. It offers nearly negligible increment of computational cost and parameters, while significantly improving network performance. We conducted a series of experiments on ImageNet-1k, COCO, and VOC datasets to demonstrate the superiority of our approach. Of particular importance, we believe that it is time to shift focus from spatial features to receptive-field spatial features for current spatial attention mechanisms. In this way, we can further improve network performance and achieve even better results. The code and pre-trained models for the relevant tasks can be found at https://github.com/Liuchen1997/RFAConv.

翻译：空间注意力已被广泛用于提升卷积神经网络的性能，然而其存在一定局限性。本文提出一种关于空间注意力有效性的新视角：空间注意力机制本质上解决了卷积核参数共享的问题。但空间注意力生成的注意力图所包含的信息对大型卷积核而言并不充分。为此，我们提出一种名为感受野注意力（RFA）的新型注意力机制。现有空间注意力方法（如CBAM和CA）仅关注空间特征，未能充分解决卷积核参数共享问题。相比之下，RFA不仅关注感受野空间特征，还能为大型卷积核提供有效的注意力权重。基于RFA开发的感受野注意力卷积操作（RFAConv）代表了一种替代标准卷积操作的新方法，其在几乎不增加计算成本与参数量的同时，显著提升了网络性能。我们在ImageNet-1k、COCO和VOC数据集上开展了一系列实验以验证方法的优越性。尤为重要的是，我们认为当前空间注意力机制应将关注重点从空间特征转向感受野空间特征，从而进一步提升网络性能并取得更优结果。相关任务的代码与预训练模型可在 https://github.com/Liuchen1997/RFAConv 获取。