Recently, the use of transformers in offline reinforcement learning has become a rapidly developing area. This is due to their ability to treat the agent's trajectory in the environment as a sequence, thereby reducing the policy learning problem to sequence modeling. In environments where the agent's decisions depend on past events, it is essential to capture both the event itself and the decision point in the context of the model. However, the quadratic complexity of the attention mechanism limits the potential for context expansion. One solution to this problem is to enhance transformers with memory mechanisms. In this paper, we propose the Recurrent Action Transformer with Memory (RATE) - a model that incorporates recurrent memory. To evaluate our model, we conducted extensive experiments on both memory-intensive environments (VizDoom-Two-Color, T-Maze) and classic Atari games and MuJoCo control environments. The results show that the use of memory can significantly improve performance in memory-intensive environments while maintaining or improving results in classic environments. We hope that our findings will stimulate research on memory mechanisms for transformers applicable to offline reinforcement learning.
翻译:最近,Transformer在离线强化学习中的应用已成为一个快速发展的领域。这是因为它们能够将智能体在环境中的轨迹视为序列,从而将策略学习问题简化为序列建模。在智能体的决策依赖于过去事件的环境中,必须在模型上下文中同时捕获事件本身和决策点。然而,注意力机制的二次复杂度限制了上下文扩展的潜力。解决此问题的一种方法是为Transformer增强记忆机制。在本文中,我们提出了具有记忆的循环动作Transformer(RATE)——一种结合了循环记忆的模型。为了评估我们的模型,我们在记忆密集型环境(VizDoom-Two-Color、T-Maze)以及经典的Atari游戏和MuJoCo控制环境中进行了大量实验。结果表明,使用记忆可以显著提升记忆密集型环境中的性能,同时在经典环境中保持或改善结果。我们希望我们的发现能激发适用于离线强化学习的Transformer记忆机制研究。