Advancements in adapting deep convolution architectures for Spiking Neural Networks (SNNs) have significantly enhanced image classification performance and reduced computational burdens. However, the inability of Multiplication-Free Inference (MFI) to harmonize with attention and transformer mechanisms, which are critical to superior performance on high-resolution vision tasks, imposes limitations on these gains. To address this, our research explores a new pathway, drawing inspiration from the progress made in Multi-Layer Perceptrons (MLPs). We propose an innovative spiking MLP architecture that uses batch normalization to retain MFI compatibility and introduces a spiking patch encoding layer to reinforce local feature extraction capabilities. As a result, we establish an efficient multi-stage spiking MLP network that effectively blends global receptive fields with local feature extraction for comprehensive spike-based computation. Without relying on pre-training or sophisticated SNN training techniques, our network secures a top-1 accuracy of 66.39% on the ImageNet-1K dataset, surpassing the directly trained spiking ResNet-34 by 2.67%. Furthermore, we curtail computational costs, model capacity, and simulation steps. An expanded version of our network challenges the performance of the spiking VGG-16 network with a 71.64% top-1 accuracy, all while operating with a model capacity 2.1 times smaller. Our findings accentuate the potential of our deep SNN architecture in seamlessly integrating global and local learning abilities. Interestingly, the trained receptive field in our network mirrors the activity patterns of cortical cells.
翻译:深度卷积架构在脉冲神经网络(SNN)中的适应性进展显著提升了图像分类性能并降低了计算负担。然而,无乘法推理(MFI)无法与在高分辨率视觉任务中起关键作用的注意力机制和Transformer架构相协调,这一局限性制约了上述性能提升。为此,本研究借鉴多层感知器(MLP)的最新进展,探索了一条新的路径。我们提出了一种创新的脉冲MLP架构,通过批量归一化保持MFI兼容性,并引入脉冲分块编码层增强局部特征提取能力。由此,我们构建了一个高效的多阶段脉冲MLP网络,有效融合全局感受野与局部特征提取以实现全脉冲计算。无需预训练或复杂SNN训练技术,该网络在ImageNet-1K数据集上达到66.39%的Top-1准确率,较直接训练的脉冲ResNet-34提升2.67%。此外,我们降低了计算开销、模型容量及模拟步数。扩展版网络以2.1倍更小的模型容量实现71.64%的Top-1准确率,挑战了脉冲VGG-16网络的性能。我们的研究结果凸显了深度SNN架构在无缝整合全局与局部学习能力方面的潜力。值得注意的是,网络训练所得的感受野模式与皮层细胞的活动特征相吻合。