Deep spiking neural networks (SNNs) have drawn much attention in recent years because of their low power consumption, biological rationality and event-driven property. However, state-of-the-art deep SNNs (including Spikformer and Spikingformer) suffer from a critical challenge related to the imprecise gradient backpropagation. This problem arises from the improper design of downsampling modules in these networks, and greatly hampering the overall model performance. In this paper, we propose ConvBN-MaxPooling-LIF (CML), an SNN-optimized downsampling with precise gradient backpropagation. We prove that CML can effectively overcome the imprecision of gradient backpropagation from a theoretical perspective. In addition, we evaluate CML on ImageNet, CIFAR10, CIFAR100, CIFAR10-DVS, DVS128-Gesture datasets, and show state-of-the-art performance on all these datasets with significantly enhanced performances compared with Spikingformer. For instance, our model achieves 77.64 $\%$ on ImageNet, 96.04 $\%$ on CIFAR10, 81.4$\%$ on CIFAR10-DVS, with + 1.79$\%$ on ImageNet, +1.16$\%$ on CIFAR100 compared with Spikingformer.
翻译:深度脉冲神经网络因其低功耗、生物合理性和事件驱动特性,近年来受到广泛关注。然而,现有最先进的深度脉冲神经网络(包括Spikformer和Spikingformer)在梯度反向传播不精确方面面临关键挑战。这一问题源于这些网络中下采样模块设计不当,严重限制了模型整体性能。本文提出ConvBN-MaxPooling-LIF(CML)——一种具备精确梯度反向传播的SNN优化下采样方法。我们从理论层面证明CML能有效克服梯度反向传播的不精确性。此外,我们在ImageNet、CIFAR10、CIFAR100、CIFAR10-DVS、DVS128-Gesture数据集上评估CML,结果表明其性能全面达到最先进水平,并与Spikingformer相比显著提升。例如,我们的模型在ImageNet上达到77.64%,CIFAR10上达到96.04%,CIFAR10-DVS上达到81.4%,较Spikingformer分别提升ImageNet 1.79%、CIFAR100 1.16%。