Recurrent Neural Networks (RNNs) are useful in temporal sequence tasks. However, training RNNs involves dense matrix multiplications which require hardware that can support a large number of arithmetic operations and memory accesses. Implementing online training of RNNs on the edge calls for optimized algorithms for an efficient deployment on hardware. Inspired by the spiking neuron model, the Delta RNN exploits temporal sparsity during inference by skipping over the update of hidden states from those inactivated neurons whose change of activation across two timesteps is below a defined threshold. This work describes a training algorithm for Delta RNNs that exploits temporal sparsity in the backward propagation phase to reduce computational requirements for training on the edge. Due to the symmetric computation graphs of forward and backward propagation during training, the gradient computation of inactivated neurons can be skipped. Results show a reduction of $\sim$80% in matrix operations for training a 56k parameter Delta LSTM on the Fluent Speech Commands dataset with negligible accuracy loss. Logic simulations of a hardware accelerator designed for the training algorithm show 2-10X speedup in matrix computations for an activation sparsity range of 50%-90%. Additionally, we show that the proposed Delta RNN training will be useful for online incremental learning on edge devices with limited computing resources.
翻译:循环神经网络(RNN)在时序任务中具有重要应用价值。然而,训练RNN涉及密集矩阵乘法运算,需要硬件支持大量算术运算和内存访问。在边缘设备上实现RNN的在线训练,需要优化算法以在硬件上高效部署。受脉冲神经元模型启发,Delta RNN在推理阶段通过跳过那些跨时间步激活值变化低于设定阈值的非活跃神经元状态更新,实现了时间稀疏性。本文提出一种面向Delta RNN的训练算法,该算法在反向传播阶段利用时间稀疏性降低边缘端训练的计算需求。由于训练过程中前向传播与反向传播具有对称计算图,非活跃神经元的梯度计算可被跳过。实验结果表明,在Fluent Speech Commands数据集上训练56k参数的Delta LSTM时,该方法可将矩阵运算量减少约80%,且精度损失可忽略。针对该训练算法设计的硬件加速器逻辑仿真显示,在50%-90%激活稀疏度范围内,矩阵计算速度可提升2-10倍。此外,我们证明所提出的Delta RNN训练方法对计算资源受限的边缘设备上的在线增量学习具有实用价值。