In autonomous driving with image based state space, accurate prediction of future events and modeling diverse behavioral modes are essential for safety and effective decision-making. World model-based Reinforcement Learning (WMRL) approaches offers a promising solution by simulating future states from current state and actions. However, utility of world models is often limited by typical RL policies being limited to deterministic or single gaussian distribution. By failing to capture the full spectrum of possible actions, reduces their adaptability in complex, dynamic environments. In this work, we introduce Imagine-2-Drive, a framework that consists of two components, VISTAPlan, a high-fidelity world model for accurate future prediction and Diffusion Policy Actor (DPA), a diffusion based policy to model multi-modal behaviors for trajectory prediction. We use VISTAPlan to simulate and evaluate trajectories from DPA and use Denoising Diffusion Policy Optimization (DDPO) to train DPA to maximize the cumulative sum of rewards over the trajectories. We analyze the benefits of each component and the framework as a whole in CARLA with standard driving metrics. As a consequence of our twin novelties- VISTAPlan and DPA, we significantly outperform the state of the art (SOTA) world models on standard driving metrics by 15% and 20% on Route Completion and Success Rate respectively.
翻译:在基于图像状态空间的自动驾驶中,对未来事件的准确预测以及对多样化行为模式的建模,对于安全性和有效决策至关重要。基于世界模型的强化学习方法通过从当前状态和动作模拟未来状态,提供了一种有前景的解决方案。然而,世界模型的效用常受限于典型的强化学习策略仅局限于确定性或单高斯分布。由于未能捕捉全部可能的动作范围,这降低了它们在复杂动态环境中的适应性。在本工作中,我们提出了Imagine-2-Drive框架,该框架包含两个组件:VISTAPlan——一个用于准确未来预测的高保真世界模型,以及Diffusion Policy Actor——一个基于扩散的策略,用于对轨迹预测中的多模态行为进行建模。我们使用VISTAPlan来模拟和评估来自DPA的轨迹,并使用Denoising Diffusion Policy Optimization来训练DPA,以最大化轨迹上的累积奖励总和。我们在CARLA仿真环境中,使用标准驾驶指标分析了每个组件及整个框架的优势。得益于VISTAPlan和DPA这两项新颖设计,我们在标准驾驶指标上显著超越了现有最佳世界模型,其中路线完成率和成功率分别提升了15%和20%。