World models learn the consequences of actions in vision-based interactive systems. However, in practical scenarios like autonomous driving, noncontrollable dynamics that are independent or sparsely dependent on action signals often exist, making it challenging to learn effective world models. To address this issue, we propose Iso-Dream++, a model-based reinforcement learning approach that has two main contributions. First, we optimize the inverse dynamics to encourage the world model to isolate controllable state transitions from the mixed spatiotemporal variations of the environment. Second, we perform policy optimization based on the decoupled latent imaginations, where we roll out noncontrollable states into the future and adaptively associate them with the current controllable state. This enables long-horizon visuomotor control tasks to benefit from isolating mixed dynamics sources in the wild, such as self-driving cars that can anticipate the movement of other vehicles, thereby avoiding potential risks. On top of our previous work, we further consider the sparse dependencies between controllable and noncontrollable states, address the training collapse problem of state decoupling, and validate our approach in transfer learning setups. Our empirical study demonstrates that Iso-Dream++ outperforms existing reinforcement learning models significantly on CARLA and DeepMind Control.
翻译:世界模型学习视觉交互系统中动作的后果。然而,在自动驾驶等实际场景中,常存在与动作信号独立或稀疏依赖的不可控动力学,这使得学习有效世界模型极具挑战。为解决该问题,我们提出Iso-Dream++,一种具有两大贡献的基于模型的强化学习方法。首先,我们优化逆动力学,促使世界模型从环境的混合时空变化中分离出可控状态转移。其次,我们基于解耦的潜在想象进行策略优化,将不可控状态展开至未来,并使其与当前可控状态自适应关联。这使得长时程视觉运动控制任务能够通过分离野外混合动力学源而受益,例如可预测其他车辆运动从而规避潜在风险的自驾汽车。在先前工作基础上,我们进一步考虑可控与不可控状态间的稀疏依赖关系,解决状态解耦的训练崩溃问题,并在迁移学习场景中验证方法有效性。实验表明,Iso-Dream++在CARLA和DeepMind Control基准上显著优于现有强化学习模型。