The Dreamer agent provides various benefits of Model-Based Reinforcement Learning (MBRL) such as sample efficiency, reusable knowledge, and safe planning. However, its world model and policy networks inherit the limitations of recurrent neural networks and thus an important question is how an MBRL framework can benefit from the recent advances of transformers and what the challenges are in doing so. In this paper, we propose a transformer-based MBRL agent, called TransDreamer. We first introduce the Transformer State-Space Model, a world model that leverages a transformer for dynamics predictions. We then share this world model with a transformer-based policy network and obtain stability in training a transformer-based RL agent. In experiments, we apply the proposed model to 2D visual RL and 3D first-person visual RL tasks both requiring long-range memory access for memory-based reasoning. We show that the proposed model outperforms Dreamer in these complex tasks.
翻译:Dreamer智能体展现了基于模型的强化学习(MBRL)的多种优势,如样本高效性、知识可重用性和安全规划能力。然而,其世界模型与策略网络继承了循环神经网络的固有局限,因此一个重要问题是:MBRL框架如何从Transformer的最新进展中受益,以及在此过程中面临哪些挑战。本文提出一种基于Transformer的MBRL智能体,称为TransDreamer。我们首先提出Transformer状态空间模型,这是一种利用Transformer进行动态预测的世界模型。随后将该世界模型与基于Transformer的策略网络共享,实现了基于Transformer的强化学习智能体训练的稳定性。在实验中,我们将所提模型应用于需要长程记忆访问以进行基于记忆推理的2D视觉强化学习任务和3D第一人称视觉强化学习任务。实验表明,所提模型在这些复杂任务中的表现优于Dreamer。