Spiking neural networks (SNNs) mimic brain computational strategies, and exhibit substantial capabilities in spatiotemporal information processing. As an essential factor for human perception, visual attention refers to the dynamic process for selecting salient regions in biological vision systems. Although visual attention mechanisms have achieved great success in computer vision applications, they are rarely introduced into SNNs. Inspired by experimental observations on predictive attentional remapping, we propose a new spatial-channel-temporal-fused attention (SCTFA) module that can guide SNNs to efficiently capture underlying target regions by utilizing accumulated historical spatial-channel information in the present study. Through a systematic evaluation on three event stream datasets (DVS Gesture, SL-Animals-DVS and MNIST-DVS), we demonstrate that the SNN with the SCTFA module (SCTFA-SNN) not only significantly outperforms the baseline SNN (BL-SNN) and two other SNN models with degenerated attention modules, but also achieves competitive accuracy with existing state-of-the-art methods. Additionally, our detailed analysis shows that the proposed SCTFA-SNN model has strong robustness to noise and outstanding stability when faced with incomplete data, while maintaining acceptable complexity and efficiency. Overall, these findings indicate that incorporating appropriate cognitive mechanisms of the brain may provide a promising approach to elevate the capabilities of SNNs.
翻译:脉冲神经网络(SNNs)模拟大脑的计算策略,并在时空信息处理方面展现出显著能力。作为人类感知的关键因素,视觉注意力指的是生物视觉系统中用于选择显著区域的动态过程。尽管视觉注意力机制在计算机视觉应用中取得了巨大成功,但它们很少被引入SNNs。受预测性注意力重映射实验观察的启发,我们提出了一种新的空间-通道-时间融合注意力(SCTFA)模块,该模块通过利用累积的历史空间-通道信息,引导SNNs高效捕捉潜在目标区域。通过对三个事件流数据集(DVS Gesture、SL-Animals-DVS和MNIST-DVS)的系统评估,我们证明了带有SCTFA模块的SNN(SCTFA-SNN)不仅显著优于基线SNN(BL-SNN)和另外两个带有退化注意力模块的SNN模型,还与现有最先进方法达到了竞争性的精度。此外,我们的详细分析表明,所提出的SCTFA-SNN模型在面对不完整数据时具有强大的噪声鲁棒性和出色的稳定性,同时保持了可接受的复杂性和效率。总体而言,这些发现表明,融入大脑的适当认知机制可能为提升SNN能力提供一种有前景的途径。