Communication by binary and sparse spikes is a key factor for the energy efficiency of biological brains. However, training deep spiking neural networks (SNNs) with backpropagation is harder than with artificial neural networks (ANNs), which is puzzling given that recent theoretical results provide exact mapping algorithms from ReLU to time-to-first-spike (TTFS) SNNs. Building upon these results, we analyze in theory and in simulation the learning dynamics of TTFS-SNNs. Our analysis highlights that even when an SNN can be mapped exactly to a ReLU network, it cannot always be robustly trained by gradient descent. The reason for that is the emergence of a specific instance of the vanishing-or-exploding gradient problem leading to a bias in the gradient descent trajectory in comparison with the equivalent ANN. After identifying this issue we derive a generic solution for the network initialization and SNN parameterization which guarantees that the SNN can be trained as robustly as its ANN counterpart. Our theoretical findings are illustrated in practice on image classification datasets. Our method achieves the same accuracy as deep ConvNets on CIFAR10 and enables fine-tuning on the much larger PLACES365 dataset without loss of accuracy compared to the ANN. We argue that the combined perspective of conversion and fine-tuning with robust gradient descent in SNN will be decisive to optimize SNNs for hardware implementations needing low latency and resilience to noise and quantization.
翻译:二进制和稀疏脉冲的通信是生物大脑能量效率的关键因素。然而,使用反向传播训练深度脉冲神经网络(SNN)比人工神经网络(ANN)更困难,鉴于近期理论结果提供了从ReLU到时间-到-首次脉冲(TTFS)SNN的精确映射算法,这一现象令人困惑。基于这些结果,我们从理论和仿真角度分析了TTFS-SNN的学习动态。我们的分析表明,即使SNN可以精确映射到ReLU网络,它并不总能通过梯度下降稳健训练。原因在于一种特定形式的梯度消失或爆炸问题的出现,导致梯度下降轨迹与等效ANN相比存在偏差。在识别此问题后,我们推导出一种网络初始化和SNN参数化的通用解决方案,确保SNN能够像其ANN对应模型一样稳健训练。我们的理论发现在图像分类数据集上通过实践得到验证。该方法在CIFAR10上实现了与深度卷积网络相同的精度,并能够在更大的PLACES365数据集上进行微调,且与ANN相比精度不损失。我们认为,结合转换与基于稳健梯度下降的SNN微调视角,对于优化需要低延迟以及对噪声和量化具有鲁棒性的硬件实现的SNN至关重要。