Backpropagation through time (BPTT) is the standard algorithm for training recurrent neural networks (RNNs), which requires separate simulation phases for the forward and backward passes for inference and learning, respectively. Moreover, BPTT requires storing the complete history of network states between phases, with memory consumption growing proportional to the input sequence length. This makes BPTT unsuited for online learning and presents a challenge for implementation on low-resource real-time systems. Real-Time Recurrent Learning (RTRL) allows online learning, and the growth of required memory is independent of sequence length. However, RTRL suffers from exceptionally high computational costs that grow proportional to the fourth power of the state size, making RTRL computationally intractable for all but the smallest of networks. In this work, we show that recurrent networks exhibiting high activity sparsity can reduce the computational cost of RTRL. Moreover, combining activity and parameter sparsity can lead to significant enough savings in computational and memory costs to make RTRL practical. Unlike previous work, this improvement in the efficiency of RTRL can be achieved without using any approximations for the learning process.
翻译:时间反向传播(BPTT)是训练递归神经网络(RNN)的标准算法,其推理和学习分别需要前向和反向传播的独立模拟阶段。此外,BPTT需要在各阶段之间存储完整的网络状态历史,内存消耗随输入序列长度线性增长。这使得BPTT不适用于在线学习,并给低资源实时系统的实现带来挑战。实时递归学习(RTRL)支持在线学习,且所需内存增长与序列长度无关。然而,RTRL存在极高的计算成本——其随状态大小的四次方增长,导致除极小规模网络外均无法实际应用。本研究表明,具有高活动稀疏性的递归网络可降低RTRL的计算成本。进一步地,结合活动稀疏性与参数稀疏性,能显著节省计算与内存开销,使RTRL具备实用性。与先前工作不同,本方法无需对学习过程进行任何近似,即可提升RTRL的效率。