Spiking neural networks (SNNs) mimic brain computational strategies, and exhibit substantial capabilities in spatiotemporal information processing. As an essential factor for human perception, visual attention refers to the dynamic process for selecting salient regions in biological vision systems. Although visual attention mechanisms have achieved great success in computer vision applications, they are rarely introduced into SNNs. Inspired by experimental observations on predictive attentional remapping, we propose a new spatial-channel-temporal-fused attention (SCTFA) module that can guide SNNs to efficiently capture underlying target regions by utilizing accumulated historical spatial-channel information in the present study. Through a systematic evaluation on three event stream datasets (DVS Gesture, SL-Animals-DVS and MNIST-DVS), we demonstrate that the SNN with the SCTFA module (SCTFA-SNN) not only significantly outperforms the baseline SNN (BL-SNN) and two other SNN models with degenerated attention modules, but also achieves competitive accuracy with existing state-of-the-art methods. Additionally, our detailed analysis shows that the proposed SCTFA-SNN model has strong robustness to noise and outstanding stability when faced with incomplete data, while maintaining acceptable complexity and efficiency. Overall, these findings indicate that incorporating appropriate cognitive mechanisms of the brain may provide a promising approach to elevate the capabilities of SNNs.
翻译:脉冲神经网络(SNNs)模拟大脑计算策略,在时空信息处理中展现出显著能力。作为人类感知的关键因素,视觉注意力指生物视觉系统中选择显著区域的动态过程。尽管视觉注意力机制已在计算机视觉应用中取得巨大成功,但鲜有被引入SNNs。受预测性注意力重映射实验观察的启发,本研究提出一种新型空间-通道-时间融合注意力(SCTFA)模块,该模块可通过利用累积的历史空间-通道信息,引导SNNs高效捕获潜在目标区域。通过对三个事件流数据集(DVS Gesture、SL-Animals-DVS和MNIST-DVS)的系统评估,我们证明:搭载SCTFA模块的SNN(SCTFA-SNN)不仅显著优于基准SNN(BL-SNN)及两种带有退化注意力模块的SNN模型,还达到了与现有最优方法相媲美的精度。此外,详细分析表明,所提出的SCTFA-SNN模型在保持可接受复杂度与效率的同时,对噪声具有强鲁棒性,并在面对不完整数据时展现出卓越稳定性。总体而言,这些发现表明,融入大脑的适当认知机制可能是提升SNNs能力的一条有前景的途径。