Nearly all real world tasks are inherently partially observable, necessitating the use of memory in Reinforcement Learning (RL). Most model-free approaches summarize the trajectory into a latent Markov state using memory models borrowed from Supervised Learning (SL), even though RL tends to exhibit different training and efficiency characteristics. Addressing this discrepancy, we introduce Fast and Forgetful Memory, an algorithm-agnostic memory model designed specifically for RL. Our approach constrains the model search space via strong structural priors inspired by computational psychology. It is a drop-in replacement for recurrent neural networks (RNNs) in recurrent RL algorithms, achieving greater reward than RNNs across various recurrent benchmarks and algorithms without changing any hyperparameters. Moreover, Fast and Forgetful Memory exhibits training speeds two orders of magnitude faster than RNNs, attributed to its logarithmic time and linear space complexity. Our implementation is available at https://github.com/proroklab/ffm.
翻译:几乎所有现实世界任务本质上都是部分可观测的,这使得强化学习必须使用记忆机制。当前多数无模型方法采用从监督学习中借鉴的记忆模型,将轨迹压缩为隐马尔可夫状态——尽管强化学习在训练过程和效率特性方面往往与监督学习存在显著差异。针对这一矛盾,我们提出"快速与遗忘记忆"——一种专为强化学习设计的算法无关记忆模型。该方法通过引入计算心理学启发的强结构先验来约束模型搜索空间。作为循环强化学习算法中循环神经网络的即插即用替代方案,该模型在多种循环基准测试和算法上无需调整任何超参数即可获得优于循环神经网络的奖励回报。此外,由于具备对数时间复杂度和线性空间复杂度,快速与遗忘记忆的训练速度比循环神经网络快两个数量级。我们的实现代码参见 https://github.com/proroklab/ffm。