Training spiking neural networks to approximate complex functions is essential for studying information processing in the brain and neuromorphic computing. Yet, the binary nature of spikes constitutes a challenge for direct gradient-based training. To sidestep this problem, surrogate gradients have proven empirically successful, but their theoretical foundation remains elusive. Here, we investigate the relation of surrogate gradients to two theoretically well-founded approaches. On the one hand, we consider smoothed probabilistic models, which, due to lack of support for automatic differentiation, are impractical for training deep spiking neural networks, yet provide gradients equivalent to surrogate gradients in single neurons. On the other hand, we examine stochastic automatic differentiation, which is compatible with discrete randomness but has never been applied to spiking neural network training. We find that the latter provides the missing theoretical basis for surrogate gradients in stochastic spiking neural networks. We further show that surrogate gradients in deterministic networks correspond to a particular asymptotic case and numerically confirm the effectiveness of surrogate gradients in stochastic multi-layer spiking neural networks. Finally, we illustrate that surrogate gradients are not conservative fields and, thus, not gradients of a surrogate loss. Our work provides the missing theoretical foundation for surrogate gradients and an analytically well-founded solution for end-to-end training of stochastic spiking neural networks.
翻译:训练脉冲神经网络逼近复杂函数对于研究大脑信息处理及神经形态计算至关重要。然而,脉冲的二元性质对直接基于梯度的训练构成了挑战。为规避这一问题,替代梯度已在实证研究中展现出有效性,但其理论基础仍不明确。本文探究了替代梯度与两种具有坚实理论基础方法的关系。一方面,我们考虑了平滑概率模型——该模型因不支持自动微分而难以用于深度脉冲神经网络训练,但其在单神经元中提供的梯度与替代梯度等价。另一方面,我们研究了随机自动微分——该技术兼容离散随机性但此前从未应用于脉冲神经网络训练。研究发现,后者为随机脉冲神经网络中的替代梯度提供了缺失的理论基础。我们进一步证明,确定性网络中的替代梯度对应特定渐近情形,并通过数值实验验证了替代梯度在随机多层脉冲神经网络中的有效性。最后,我们阐明替代梯度并非保守场,因而并非替代损失函数的梯度。本研究为替代梯度提供了缺失的理论根基,并为随机脉冲神经网络的端到端训练给出了具有严格分析基础的解决方案。