Real-time recurrent learning (RTRL) for sequence-processing recurrent neural networks (RNNs) offers certain conceptual advantages over backpropagation through time (BPTT). RTRL requires neither caching past activations nor truncating context, and enables online learning. However, RTRL's time and space complexity make it impractical. To overcome this problem, most recent work on RTRL focuses on approximation theories, while experiments are often limited to diagnostic settings. Here we explore the practical promise of RTRL in more realistic settings. We study actor-critic methods that combine RTRL and policy gradients, and test them in several subsets of DMLab-30, ProcGen, and Atari-2600 environments. On DMLab memory tasks, our system trained on fewer than 1.2 B environmental frames is competitive with or outperforms well-known IMPALA and R2D2 baselines trained on 10 B frames. To scale to such challenging tasks, we focus on certain well-known neural architectures with element-wise recurrence, allowing for tractable RTRL without approximation. Importantly, we also discuss rarely addressed limitations of RTRL in real-world applications, such as its complexity in the multi-layer case.
翻译:实时循环学习(RTRL)作为一种用于序列处理循环神经网络(RNN)的方法,相比通过时间的反向传播(BPTT)具有某些概念上的优势。RTRL既不需要缓存过去的激活值,也不需要截断上下文,并且支持在线学习。然而,RTRL的时间和空间复杂度使其在实际中不可行。为解决这一问题,近期大多数关于RTRL的研究集中于近似理论,而实验往往局限于诊断性场景。本文在更实际的设置中探索了RTRL的实用潜力。我们研究了将RTRL与策略梯度相结合的演员-评论家方法,并在DMLab-30、ProcGen和Atari-2600环境的若干子集上进行了测试。在DMLab记忆任务中,我们的系统在少于12亿环境帧的训练下,与在100亿帧上训练且知名的IMPALA和R2D2基线相比,具有竞争力或表现更优。为扩展至此类具有挑战性的任务,我们聚焦于某些具有逐元素递归特性的著名神经网络架构,从而无需近似即可实现可处理的RTRL。重要的是,我们还讨论了RTRL在实际应用中鲜少被提及的局限性,例如其在多层情况下的复杂度。