World models learn the consequences of actions in vision-based interactive systems. However, in practical scenarios like autonomous driving, noncontrollable dynamics that are independent or sparsely dependent on action signals often exist, making it challenging to learn effective world models. To address this issue, we propose Iso-Dream++, a model-based reinforcement learning approach that has two main contributions. First, we optimize the inverse dynamics to encourage the world model to isolate controllable state transitions from the mixed spatiotemporal variations of the environment. Second, we perform policy optimization based on the decoupled latent imaginations, where we roll out noncontrollable states into the future and adaptively associate them with the current controllable state. This enables long-horizon visuomotor control tasks to benefit from isolating mixed dynamics sources in the wild, such as self-driving cars that can anticipate the movement of other vehicles, thereby avoiding potential risks. On top of our previous work, we further consider the sparse dependencies between controllable and noncontrollable states, address the training collapse problem of state decoupling, and validate our approach in transfer learning setups. Our empirical study demonstrates that Iso-Dream++ outperforms existing reinforcement learning models significantly on CARLA and DeepMind Control.
翻译:世界模型学习视觉驱动交互系统中行为的后果。然而,在自动驾驶等实际场景中,通常存在独立于或稀疏依赖于动作信号的不可控动力学,这使得学习有效的世界模型颇具挑战。为解决此问题,我们提出Iso-Dream++,一种具有两大主要贡献的基于模型的强化学习方法。首先,我们优化逆动力学,以鼓励世界模型从环境的混合时空变化中分离出可控状态转移。其次,我们基于解耦的潜在想象执行策略优化,即滚动预测未来不可控状态,并将其与当前可控状态自适应关联。这使得长时域视觉运动控制任务能从野外混合动力学源的隔离中获益,例如,能够预判其他车辆运动从而规避潜在风险的自动驾驶汽车。在前序工作基础上,我们进一步考虑了可控与不可控状态间的稀疏依赖关系,解决了状态解耦的训练崩溃问题,并在迁移学习设置中验证了方法有效性。实证研究表明,在CARLA和DeepMind Control平台上,Iso-Dream++显著优于现有强化学习模型。