Training deep spiking neural networks (SNNs) remains challenging due to sharp loss landscapes and temporal inconsistency caused by surrogate gradients. To address these challenges, we propose a unified framework: adaptive and asymmetric surrogate gradients A2SG. The adaptive gradients adjust an effective window for spatio-temporal adaptation, reducing spatial gradient variation and maintaining directional consistency of gradients over time. The asymmetric gradients reflect neuronal dynamics by assigning larger gradients to neurons with higher membrane potentials, and we prove that they yield lower variation than symmetric surrogates. Our analysis further establishes a direct connection between local gradient variation and the curvature of the loss landscape, providing a principled explanation for how A2SG promotes convergence to flatter minima and improves generalization. We conduct extensive experiments on diverse models, including CNN-based and Transformer-based SNNs, across various tasks such as image classification using both static and neuromorphic datasets, as well as segmentation. The results demonstrate that A2SG consistently improves accuracy and energy efficiency, establishing it as a general and reliable solution for training deep SNNs. Our code is available at https://github.com/KIST-NCL/A2SG.git.
翻译:训练深度脉冲神经网络(SNN)仍具有挑战性,原因在于替代梯度导致的尖锐损失景观和时间不一致性。为应对这些挑战,我们提出了统一框架:自适应非对称替代梯度A2SG。自适应梯度通过调整有效窗口实现时空自适应,降低空间梯度变化并保持梯度在时间上的方向一致性。非对称梯度通过为具有更高膜电位的神经元分配更大梯度来反映神经元动态,我们证明其比对称替代梯度产生的变异性更低。我们的分析进一步建立了局部梯度变异性与损失景观曲率之间的直接联系,为A2SG如何促进收敛至更平坦极小值并提升泛化能力提供了理论解释。我们在多种模型(包括基于CNN和Transformer的SNN)上开展了广泛实验,涵盖静态和神经形态数据集图像分类及分割等任务。结果表明,A2SG能够一致提升准确率和能效,成为训练深度SNN的通用且可靠解决方案。代码见 https://github.com/KIST-NCL/A2SG.git。