Deep spiking neural networks (SNNs) have drawn much attention in recent years because of their low power consumption, biological rationality and event-driven property. However, state-of-the-art deep SNNs (including Spikformer and Spikingformer) suffer from a critical challenge related to the imprecise gradient backpropagation. This problem arises from the improper design of downsampling modules in these networks, and greatly hampering the overall model performance. In this paper, we propose ConvBN-MaxPooling-LIF (CML), an SNN-optimized downsampling with precise gradient backpropagation. We prove that CML can effectively overcome the imprecision of gradient backpropagation from a theoretical perspective. In addition, we evaluate CML on ImageNet, CIFAR10, CIFAR100, CIFAR10-DVS, DVS128-Gesture datasets, and show state-of-the-art performance on all these datasets with significantly enhanced performances compared with Spikingformer. For instance, our model achieves 77.64 $\%$ on ImageNet, 96.04 $\%$ on CIFAR10, 81.4$\%$ on CIFAR10-DVS, with + 1.79$\%$ on ImageNet, +1.16$\%$ on CIFAR100 compared with Spikingformer.
翻译:深度脉冲神经网络(SNN)近年来因其低功耗、生物合理性和事件驱动特性而备受关注。然而,当前最先进的深度脉冲神经网络(包括Spikformer和Spikingformer)面临一个关键挑战——梯度反向传播不精确。该问题源于这些网络中下采样模块的设计不当,严重影响了模型的整体性能。本文提出了一种SNN优化的精确梯度反向传播下采样方法——卷积批归一化-最大池化-脉冲神经元(ConvBN-MaxPooling-LIF, CML)。我们从理论上证明了CML能够有效克服梯度反向传播的不精确性。此外,我们在ImageNet、CIFAR10、CIFAR100、CIFAR10-DVS和DVS128-Gesture数据集上对CML进行了评估,结果表明该模型在所有数据集上均达到了最先进的性能,且与Spikingformer相比性能显著提升。例如,我们的模型在ImageNet上达到77.64%,在CIFAR10上达到96.04%,在CIFAR10-DVS上达到81.4%,与Spikingformer相比,在ImageNet上提升1.79%,在CIFAR100上提升1.16%。