Stability arguments are often used to prevent learning algorithms from having ever increasing activity and weights that hinder generalization. However, stability conditions can clash with the sparsity required to augment the energy efficiency of spiking neurons. Nonetheless it can also provide solutions. In fact, spiking Neuromorphic Computing uses binary activity to improve Artificial Intelligence energy efficiency. However, its non-smoothness requires approximate gradients, known as Surrogate Gradients (SG), to close the performance gap with Deep Learning. Several SG have been proposed in the literature, but it remains unclear how to determine the best SG for a given task and network. Thus, we aim at theoretically define the best SG, through stability arguments, to reduce the need for grid search. In fact, we show that more complex tasks and networks need more careful choice of SG, even if overall the derivative of the fast sigmoid tends to outperform the other, for a wide range of learning rates. We therefore design a stability based theoretical method to choose initialization and SG shape before training on the most common spiking neuron, the Leaky Integrate and Fire (LIF). Since our stability method suggests the use of high firing rates at initialization, which is non-standard in the neuromorphic literature, we show that high initial firing rates, combined with a sparsity encouraging loss term introduced gradually, can lead to better generalization, depending on the SG shape. Our stability based theoretical solution, finds a SG and initialization that experimentally result in improved accuracy. We show how it can be used to reduce the need of extensive grid-search of dampening, sharpness and tail-fatness of the SG. We also show that our stability concepts can be extended to be applicable on different LIF variants, such as DECOLLE and fluctuations-driven initializations.
翻译:稳定性论证常被用于防止学习算法中出现持续增长的活动和权重,从而避免阻碍泛化。然而,稳定性条件可能与增强脉冲神经元能效所需的稀疏性产生冲突。尽管如此,稳定性也能提供解决方案。事实上,脉冲神经形态计算利用二值活动提升人工智能能效,但其非光滑性需要借助近似梯度(称为替代梯度,SG)来弥补与深度学习之间的性能差距。文献中已提出多种SG,但如何针对特定任务和网络确定最优SG仍不明确。因此,我们旨在通过稳定性论证从理论上定义最优SG,以减少网格搜索的需求。研究表明,更复杂的任务和网络需要更谨慎地选择SG,即使快速Sigmoid函数的导数在广泛的学习率范围内整体上优于其他函数。我们基于稳定性理论设计了一种方法,用于在训练最常见的脉冲神经元——漏积分激发(LIF)模型之前选择初始化方式与SG形状。由于我们的稳定性方法建议在初始化时采用高发放率——这在神经形态文献中并不常见,我们证明:高初始发放率结合逐步引入的稀疏性激励损失项,可根据SG形状实现更好的泛化。基于稳定性的理论解决方案,实验表明能找到提升准确率的SG与初始化方法,并展示其如何减少对SG阻尼、锐度及尾部平滑度的密集网格搜索需求。此外,我们的稳定性概念可扩展至不同LIF变体(如DECOLLE和波动驱动初始化)。