Attention mechanisms have become a core component of deep learning models, with Channel Attention and Spatial Attention being the two most representative architectures. Current research on their fusion strategies primarily bifurcates into sequential and parallel paradigms, yet the selection process remains largely empirical, lacking systematic analysis and unified principles. We systematically compare channel-spatial attention combinations under a unified framework, building an evaluation suite of 18 topologies across four classes: sequential, parallel, multi-scale, and residual. Across two vision and nine medical datasets, we uncover a "data scale-method-performance" coupling law: (1) in few-shot tasks, the "Channel-Multi-scale Spatial" cascaded structure achieves optimal performance; (2) in medium-scale tasks, parallel learnable fusion architectures demonstrate superior results; (3) in large-scale tasks, parallel structures with dynamic gating yield the best performance. Additionally, experiments indicate that the "Spatial-Channel" order is more stable and effective for fine-grained classification, while residual connections mitigate vanishing gradient problems across varying data scales. We thus propose scenario-based guidelines for building future attention modules. Code is open-sourced at https://github.com/DWlzm.
翻译:注意力机制已成为深度学习模型的核心组件,其中通道注意力与空间注意力是两种最具代表性的架构。当前关于其融合策略的研究主要分为串行与并行两种范式,但选择过程仍主要依赖经验,缺乏系统分析和统一原则。我们在统一框架下系统比较了通道-空间注意力的组合方式,构建了涵盖串行、并行、多尺度及残差四类共18种拓扑结构的评估套件。通过在两个视觉数据集和九个医学数据集上的实验,我们揭示了“数据规模-方法-性能”的耦合规律:(1) 在小样本任务中,“通道-多尺度空间”级联结构取得最优性能;(2) 在中等规模任务中,并行可学习融合架构展现出更优结果;(3) 在大规模任务中,采用动态门控的并行结构获得最佳性能。此外,实验表明“空间-通道”顺序在细粒度分类任务中更稳定有效,而残差连接能缓解不同数据规模下的梯度消失问题。基于此,我们提出了面向场景的注意力模块构建指南。代码已开源:https://github.com/DWlzm。