Spiking Neural Networks (SNNs) offer notable advantages in biological plausibility and energy efficiency, making them promising candidates for building low-power Transformers. However, existing Spiking Transformers largely adhere to a passive reactive paradigm, which struggles to focus on task-relevant information and incurs substantial computational overhead when processing redundant visual data. To overcome this fundamental yet underexplored limitation, we propose SAFformer, a novel Spiking Transformer architecture based on an active predictive filtering paradigm. Inspired by the brain's predictive coding mechanism, SAFformer actively suppresses predictable signals and focuses on salient visual features. Extensive experiments show that SAFformer establishes new state-of-the-art performance on CIFAR-10/100 and CIFAR10-DVS. Remarkably, on ImageNet-1K, it achieves 80.44% Top-1 accuracy with only 26.58M parameters and an energy consumption of 5.88 mJ, demonstrating an exceptional balance between accuracy and efficiency.
翻译:脉冲神经网络在生物合理性和能效方面具有显著优势,使其成为构建低功耗Transformer的理想候选。然而,现有脉冲Transformer大多遵循被动反应范式,在处理冗余视觉数据时难以聚焦于任务相关信息,且会消耗大量计算资源。为克服这一基础但尚未充分探索的局限性,我们提出SAFormer——一种基于主动预测滤波范式的新型脉冲Transformer架构。受大脑预测编码机制启发,SAFormer主动抑制可预测信号,聚焦于显著视觉特征。大量实验表明,SAFormer在CIFAR-10/100和CIFAR10-DVS数据集上创造了新的最优性能。值得注意的是,在ImageNet-1K上,它仅以26.58M参数和5.88mJ能耗便达到80.44%的Top-1准确率,展现了精度与效率的卓越平衡。