World models are a fundamental component in model-based reinforcement learning (MBRL). To perform temporally extended and consistent simulations of the future in partially observable environments, world models need to possess long-term memory. However, state-of-the-art MBRL agents, such as Dreamer, predominantly employ recurrent neural networks (RNNs) as their world model backbone, which have limited memory capacity. In this paper, we seek to explore alternative world model backbones for improving long-term memory. In particular, we investigate the effectiveness of Transformers and Structured State Space Sequence (S4) models, motivated by their remarkable ability to capture long-range dependencies in low-dimensional sequences and their complementary strengths. We propose S4WM, the first world model compatible with parallelizable SSMs including S4 and its variants. By incorporating latent variable modeling, S4WM can efficiently generate high-dimensional image sequences through latent imagination. Furthermore, we extensively compare RNN-, Transformer-, and S4-based world models across four sets of environments, which we have tailored to assess crucial memory capabilities of world models, including long-term imagination, context-dependent recall, reward prediction, and memory-based reasoning. Our findings demonstrate that S4WM outperforms Transformer-based world models in terms of long-term memory, while exhibiting greater efficiency during training and imagination. These results pave the way for the development of stronger MBRL agents.
翻译:世界模型是基于模型的强化学习(MBRL)中的基础组件。为了在部分可观测环境中对未来进行时间延展且一致的模拟,世界模型需要具备长期记忆能力。然而,当前最先进的MBRL智能体(如Dreamer)主要采用循环神经网络(RNN)作为其世界模型骨干,而RNN的记忆容量有限。本文旨在探索替代性世界模型骨干以改进长期记忆。具体而言,受Transformer和结构化状态空间序列(S4)模型在低维序列中捕捉长程依赖的卓越能力及其互补优势的启发,我们研究了这两种模型的有效性。我们提出S4WM——首个兼容包括S4及其变体在内的可并行化状态空间模型(SSM)的世界模型。通过引入潜在变量建模,S4WM能够通过潜在想象高效生成高维图像序列。此外,我们在四组定制化环境上系统比较了基于RNN、Transformer和S4的世界模型——这些环境专门用于评估世界模型的关键记忆能力,包括长期想象、上下文依赖回忆、奖励预测及基于记忆的推理。研究结果表明,S4WM在长期记忆方面优于基于Transformer的世界模型,并在训练和想象过程中展现出更高效率。这些结果为开发更强大的MBRL智能体奠定了基础。