Advancements in adapting deep convolution architectures for Spiking Neural Networks (SNNs) have significantly enhanced image classification performance and reduced computational burdens. However, the inability of Multiplication-Free Inference (MFI) to align with attention and transformer mechanisms, which are critical to superior performance on high-resolution vision tasks, imposing limitations on these gains. To address this, our research explores a new pathway, drawing inspiration from the progress made in Multi-Layer Perceptrons (MLPs). We propose an innovative spiking MLP architecture that uses batch normalization to retain MFI compatibility and introducing a spiking patch encoding layer to enhance local feature extraction capabilities. As a result, we establish an efficient multi-stage spiking MLP network that blends effectively global receptive fields with local feature extraction for comprehensive spike-based computation. Without relying on pre-training or sophisticated SNN training techniques, our network secures a top-1 accuracy of 66.39% on the ImageNet-1K dataset, surpassing the directly trained spiking ResNet-34 by 2.67%. Furthermore, we curtail computational costs, model parameters, and simulation steps. An expanded version of our network compares with the performance of the spiking VGG-16 network with a 71.64% top-1 accuracy, all while operating with a model capacity 2.1 times smaller. Our findings highlight the potential of our deep SNN architecture in effectively integrating global and local learning abilities. Interestingly, the trained receptive field in our network mirrors the activity patterns of cortical cells. Source codes are publicly accessible at https://github.com/EMI-Group/mixer-snn.
翻译:针对深度卷积架构在脉冲神经网络(SNN)中的适应性改进显著提升了图像分类性能并降低了计算负担。然而,无乘法推理(MFI)无法与注意力机制及Transformer架构有效兼容——这些机制对高分辨率视觉任务的高性能至关重要——从而限制了此类改进的成效。为解决这一问题,本研究从多层感知机(MLP)的进展中汲取灵感,探索全新路径。我们提出一种创新的脉冲MLP架构,通过批归一化保持MFI兼容性,并引入脉冲分块编码层增强局部特征提取能力。由此构建的高效多级脉冲MLP网络,将全局感受野与局部特征提取有效融合,实现基于脉冲的全面计算。在不依赖预训练或复杂SNN训练技术的情况下,我们的网络在ImageNet-1K数据集上达到66.39%的Top-1准确率,超越直接训练的脉冲ResNet-34达2.67%。此外,我们显著降低了计算成本、模型参数量及仿真步骤。网络扩展版本以2.1倍更小的模型容量,实现了与脉冲VGG-16网络相匹敌的71.64% Top-1准确率。实验结果表明,我们的深度SNN架构在整合全局与局部学习能力方面具有显著潜力。值得注意的是,网络训练后的感受野模式与皮层细胞的活动特征高度吻合。源代码已开源:https://github.com/EMI-Group/mixer-snn。