Stabilizing the LIF Neuron Training

Spiking Neuromorphic Computing uses binary activity to improve Artificial Intelligence energy efficiency. However, the non-smoothness of binary activity requires approximate gradients, known as Surrogate Gradients (SG), to close the performance gap with Deep Learning. Several SG have been proposed in the literature, but it remains unclear how to determine the best SG for a given task and network. Good performance can be achieved with most SG shapes, after a costly search of hyper-parameters. Thus, we aim at experimentally and theoretically define the best SG across different stress tests, to reduce future need of grid search. To understand the gap for this line of work, we show that more complex tasks and networks need more careful choice of SG, even if overall the derivative of the fast sigmoid outperforms other SG across tasks and networks, for a wide range of learning rates. We therefore design a stability based theoretical method to choose initialization and SG shape before training on the most common spiking architecture, the Leaky Integrate and Fire (LIF). Since our stability method suggests the use of high firing rates at initialization, which is non-standard in the neuromorphic literature, we show that high initial firing rates, combined with a sparsity encouraging loss term introduced gradually, can lead to better generalization, depending on the SG shape. Our stability based theoretical solution, finds a SG and initialization that experimentally result in improved accuracy. We show how it can be used to reduce the need of extensive grid-search of dampening, sharpness and tail-fatness of the SG.

翻译：脉冲神经形态计算利用二元活动提升人工智能能效。然而，二元活动的非平滑性要求使用近似梯度（即替代梯度）来缩小与深度学习的性能差距。现有文献提出了多种替代梯度，但如何针对特定任务和网络选择最优替代梯度仍不明确。大多数替代梯度形态在超参数穷举搜索后能取得较好性能。为此，我们旨在通过实验与理论方法定义不同压力测试下的最优替代梯度，以减少未来网格搜索需求。为阐明该研究方向的挑战，我们证明更复杂的任务与网络需要更审慎地选择替代梯度——尽管对于广泛的学习率范围，快速Sigmoid函数的导数在跨任务和网络中的表现整体优于其他替代梯度。因此，我们设计了一种基于稳定性的理论方法，在训练最常用脉冲架构——泄漏积分与发放（LIF）模型前，选择初始化参数与替代梯度形态。由于我们的稳定性方法建议在初始化时采用高位发放率（这在神经形态文献中非标准），我们证明高初始发放率结合逐步引入的稀疏性惩罚损失项，可依据替代梯度形态带来更好的泛化性能。基于稳定性的理论解所确定的替代梯度与初始化参数，在实验中可提升精度。我们展示了该方法如何减少对替代梯度阻尼、锐度与尾重程度的广泛网格搜索需求。