Sharpness Aware Surrogate Training for Spiking Neural Networks

Surrogate gradients are a standard tool for training spiking neural networks (SNNs), but conventional hard forward or surrogate backward training couples a nonsmooth forward model with a biased gradient estimator. We study sharpness aware Surrogate Training (SAST), which applies sharpness aware Minimization (SAM) to a surrogate forward SNN trained by backpropagation. In this formulation, the optimization target is an ordinary smooth empirical risk, so the training gradient is exact for the auxiliary model being optimized. Under explicit boundedness and contraction assumptions, we derive compact state stability and input Lipschitz bounds, establish smoothness of the surrogate objective, provide a first order SAM approximation bound, and prove a nonconvex convergence guarantee for stochastic SAST with an independent second minibatch. We also isolate a local mechanism proposition, stated separately from the unconditional guarantees, that links per sample parameter gradient control to smaller input gradient norms under local Jacobian conditioning. Empirically, we evaluate clean accuracy, hard spike transfer, corruption robustness, and training overhead on N-MNIST and DVS Gesture. The clearest practical effect is transfer gap reduction: on N-MNIST, hard spike accuracy rises from 65.7% to 94.7% (best at $ρ=0.30$) while surrogate forward accuracy remains high; on DVS Gesture, hard spike accuracy improves from 31.8% to 63.3% (best at $ρ=0.40$). We additionally specify the compute matched, calibration, and theory alignment controls required for a final practical assessment.

翻译：替代梯度是训练脉冲神经网络（SNN）的标准工具，但传统的硬前向或替代反向训练将非光滑的前向模型与有偏的梯度估计器耦合在一起。我们研究了尖锐感知替代训练（SAST），该方法将尖锐感知最小化（SAM）应用于通过反向传播训练的替代前向SNN。在此公式中，优化目标是一个普通的平滑经验风险，因此训练梯度对于正在优化的辅助模型是精确的。在显式有界性和收缩性假设下，我们推导了紧凑的状态稳定性和输入Lipschitz界，建立了替代目标的平滑性，提供了一阶SAM近似界，并证明了具有独立第二小批量的随机SAST的非凸收敛保证。我们还分离了一个局部机制命题，该命题与无条件保证分开陈述，在局部雅可比条件化下，将每个样本的参数梯度控制与更小的输入梯度范数联系起来。在实验上，我们在N-MNIST和DVS手势数据集上评估了干净准确率、硬脉冲迁移、噪声鲁棒性和训练开销。最明显的实际效果是迁移差距缩小：在N-MNIST上，硬脉冲准确率从65.7%提高到94.7%（在ρ=0.30时最佳），而替代前向准确率保持较高；在DVS手势数据集上，硬脉冲准确率从31.8%提高到63.3%（在ρ=0.40时最佳）。我们还指定了最终实际评估所需的计算匹配、校准和理论对齐控制。