Originally developed for natural language problems, transformer models have recently been widely used in offline reinforcement learning tasks. This is because the agent's history can be represented as a sequence, and the whole task can be reduced to the sequence modeling task. However, the quadratic complexity of the transformer operation limits the potential increase in context. Therefore, different versions of the memory mechanism are used to work with long sequences in a natural language. This paper proposes the Recurrent Memory Decision Transformer (RMDT), a model that uses a recurrent memory mechanism for reinforcement learning problems. We conduct thorough experiments on Atari games and MuJoCo control problems and show that our proposed model is significantly superior to its counterparts without the recurrent memory mechanism on Atari games. We also carefully study the effect of memory on the performance of the proposed model. These findings shed light on the potential of incorporating recurrent memory mechanisms to improve the performance of large-scale transformer models in offline reinforcement learning tasks. The Recurrent Memory Decision Transformer code is publicly available in the repository \url{https://anonymous.4open.science/r/RMDT-4FE4}.
翻译:最初为自然语言问题开发的Transformer模型,近年来已被广泛应用于离线强化学习任务。这是因为智能体的历史轨迹可表示为序列,且整个任务可简化为序列建模任务。然而,Transformer操作的二次复杂度限制了上下文长度的潜在扩展。因此,自然语言领域采用不同版本的记忆机制来处理长序列。本文提出循环记忆决策Transformer(RMDT),一种将循环记忆机制应用于强化学习问题的模型。我们在Atari游戏和MuJoCo控制问题上进行了充分实验,结果表明,所提出的模型在Atari游戏上显著优于未引入循环记忆机制的同类模型。我们还细致研究了记忆对模型性能的影响。这些发现揭示了将循环记忆机制融入大规模Transformer模型以提升离线强化学习任务性能的潜力。循环记忆决策Transformer的代码已公开于仓库:\url{https://anonymous.4open.science/r/RMDT-4FE4}。