Efficient recurrent architectures through activity sparsity and sparse back-propagation through time

Recurrent neural networks (RNNs) are well suited for solving sequence tasks in resource-constrained systems due to their expressivity and low computational requirements. However, there is still a need to bridge the gap between what RNNs are capable of in terms of efficiency and performance and real-world application requirements. The memory and computational requirements arising from propagating the activations of all the neurons at every time step to every connected neuron, together with the sequential dependence of activations, contribute to the inefficiency of training and using RNNs. We propose a solution inspired by biological neuron dynamics that makes the communication between RNN units sparse and discrete. This makes the backward pass with backpropagation through time (BPTT) computationally sparse and efficient as well. We base our model on the gated recurrent unit (GRU), extending it with units that emit discrete events for communication triggered by a threshold so that no information is communicated to other units in the absence of events. We show theoretically that the communication between units, and hence the computation required for both the forward and backward passes, scales with the number of events in the network. Our model achieves efficiency without compromising task performance, demonstrating competitive performance compared to state-of-the-art recurrent network models in real-world tasks, including language modeling. The dynamic activity sparsity mechanism also makes our model well suited for novel energy-efficient neuromorphic hardware. Code is available at https://github.com/KhaleelKhan/EvNN/.

翻译：循环神经网络（RNN）因其表达能力和低计算需求，非常适合在资源受限系统中处理序列任务。然而，RNN在效率与性能方面所能达到的水平与实际应用需求之间仍存在差距。所有神经元在每个时间步将其激活值传播至每个相连神经元所产生的内存和计算需求，加上激活值的序列依赖性，导致了训练和使用RNN的低效性。受生物神经元动态启发，我们提出一种解决方案，使RNN单元间的通信变得稀疏且离散。这使得基于时间反向传播（BPTT）的反向传递在计算上也变得稀疏且高效。我们的模型基于门控循环单元（GRU），通过引入由阈值触发的离散事件发射单元进行扩展，从而在没有事件发生时不向其他单元传递任何信息。我们从理论上证明，单元间的通信量，以及前向和反向传递所需的计算量，均与网络中的事件数量成比例。我们的模型在实现效率的同时不损害任务性能，在包括语言建模在内的真实世界任务中，展现出与最先进循环网络模型相媲美的竞争力。动态活动稀疏性机制也使我们的模型非常适合新型节能神经形态硬件。代码见 https://github.com/KhaleelKhan/EvNN/。