Transformative models, originally developed for natural language problems, have recently been widely used in offline reinforcement learning tasks. This is due to the fact that the agent's history can be represented as a sequence, and the whole task can be reduced to the sequence modeling task. However, the quadratic complexity of the transformer operation limits the potential increase in context. Therefore, to work with long sequences in a natural language, different versions of the memory mechanism are used. In this paper, we propose the Recurrent Memory Decision Transformer (RMDT), a model that uses a recurrent memory mechanism for reinforcement learning problems. We conduct thorough experiments on Atari games and MoJoCo control problems, and show that our proposed model is significantly superior to its counterparts without the recurrent memory mechanism on Atari games. We also carefully study the effect of memory on the performance of the proposed model. These findings shed light on the potential of incorporating recurrent memory mechanisms to improve the performance of large-scale transformer models in offline reinforcement learning tasks. The Recurrent Memory Decision Transformer code is publicly available in repository \url{https://anonymous.4open.science/r/RMDT-4FE4}.
翻译:Transformer模型最初为自然语言问题开发,近年来已被广泛应用于离线强化学习任务。这是因为智能体的历史交互可以表示为序列,整个任务可简化为序列建模问题。然而,Transformer操作的二次复杂度限制了上下文长度的扩展潜力。因此,为处理自然语言中的长序列,研究者采用了不同版本的记忆机制。本文提出循环记忆决策Transformer(RMDT),一种针对强化学习问题使用循环记忆机制的模型。我们在Atari游戏和MoJoCo控制问题上进行了充分实验,结果表明:在Atari游戏中,我们提出的模型显著优于未采用循环记忆机制的对应模型。我们还仔细研究了记忆对所提出模型性能的影响。这些发现揭示了将循环记忆机制整合到大规模Transformer模型中,以提升离线强化学习任务性能的潜力。循环记忆决策Transformer的代码已在仓库\url{https://anonymous.4open.science/r/RMDT-4FE4}中公开。