Take A Shortcut Back: Mitigating the Gradient Vanishing for Training Spiking Neural Networks

The Spiking Neural Network (SNN) is a biologically inspired neural network infrastructure that has recently garnered significant attention. It utilizes binary spike activations to transmit information, thereby replacing multiplications with additions and resulting in high energy efficiency. However, training an SNN directly poses a challenge due to the undefined gradient of the firing spike process. Although prior works have employed various surrogate gradient training methods that use an alternative function to replace the firing process during back-propagation, these approaches ignore an intrinsic problem: gradient vanishing. To address this issue, we propose a shortcut back-propagation method in our paper, which advocates for transmitting the gradient directly from the loss to the shallow layers. This enables us to present the gradient to the shallow layers directly, thereby significantly mitigating the gradient vanishing problem. Additionally, this method does not introduce any burden during the inference phase. To strike a balance between final accuracy and ease of training, we also propose an evolutionary training framework and implement it by inducing a balance coefficient that dynamically changes with the training epoch, which further improves the network's performance. Extensive experiments conducted over static and dynamic datasets using several popular network structures reveal that our method consistently outperforms state-of-the-art methods.

翻译：脉冲神经网络（SNN）是一种受生物启发的神经网络架构，近期受到广泛关注。它通过二值脉冲激活传递信息，从而将乘法运算替换为加法运算，实现了极高的能效。然而，由于脉冲发放过程的梯度未定义，直接训练SNN面临挑战。尽管先前研究采用多种替代梯度训练方法（在反向传播阶段用替代函数替换脉冲发放过程），但这些方法忽略了内在问题：梯度消失。为解决该问题，本文提出一种捷径反向传播方法，主张将梯度直接从损失函数传递至浅层网络。该方法能将梯度直接呈现给浅层网络，从而显著缓解梯度消失问题。此外，该方法在推理阶段不会引入任何额外负担。为兼顾最终精度与训练便捷性，我们还提出一种进化训练框架，通过引入随训练轮次动态变化的平衡系数实现该框架，进一步提升了网络性能。在静态与动态数据集上采用多种流行网络结构进行的大量实验表明，本方法始终优于现有最优方法。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日