World models constitute a promising approach for training reinforcement learning agents in a safe and sample-efficient manner. Recent world models predominantly operate on sequences of discrete latent variables to model environment dynamics. However, this compression into a compact discrete representation may ignore visual details that are important for reinforcement learning. Concurrently, diffusion models have become a dominant approach for image generation, challenging well-established methods modeling discrete latents. Motivated by this paradigm shift, we introduce DIAMOND (DIffusion As a Model Of eNvironment Dreams), a reinforcement learning agent trained in a diffusion world model. We analyze the key design choices that are required to make diffusion suitable for world modeling, and demonstrate how improved visual details can lead to improved agent performance. DIAMOND achieves a mean human normalized score of 1.46 on the competitive Atari 100k benchmark; a new best for agents trained entirely within a world model. We further demonstrate that DIAMOND's diffusion world model can stand alone as an interactive neural game engine by training on static Counter-Strike: Global Offensive gameplay. To foster future research on diffusion for world modeling, we release our code, agents, videos and playable world models at https://diamond-wm.github.io.
翻译:世界模型为以安全且样本高效的方式训练强化学习智能体提供了一种前景广阔的方法。当前主流的世界模型主要通过离散潜变量序列来建模环境动态。然而,这种压缩为紧凑离散表示的过程可能会忽略对强化学习至关重要的视觉细节。与此同时,扩散模型已成为图像生成的主导方法,对成熟的离散潜变量建模方法构成了挑战。受此范式转变的启发,我们提出了DIAMOND(扩散环境梦境模型),这是一种在扩散世界模型中训练的强化学习智能体。我们分析了使扩散模型适用于世界建模的关键设计选择,并论证了提升视觉细节如何能够改善智能体性能。DIAMOND在竞争性的Atari 100k基准测试中取得了1.46的平均人类标准化分数,这创造了完全在世界模型内训练的智能体的新纪录。我们进一步证明,通过在静态的《反恐精英:全球攻势》游戏录像上进行训练,DIAMOND的扩散世界模型可独立作为交互式神经游戏引擎。为促进扩散世界建模的后续研究,我们在https://diamond-wm.github.io公开了代码、智能体、演示视频及可交互的世界模型。