In autonomous driving, predicting future events in advance and evaluating the foreseeable risks empowers autonomous vehicles to better plan their actions, enhancing safety and efficiency on the road. To this end, we propose Drive-WM, the first driving world model compatible with existing end-to-end planning models. Through a joint spatial-temporal modeling facilitated by view factorization, our model generates high-fidelity multiview videos in driving scenes. Building on its powerful generation ability, we showcase the potential of applying the world model for safe driving planning for the first time. Particularly, our Drive-WM enables driving into multiple futures based on distinct driving maneuvers, and determines the optimal trajectory according to the image-based rewards. Evaluation on real-world driving datasets verifies that our method could generate high-quality, consistent, and controllable multiview videos, opening up possibilities for real-world simulations and safe planning.
翻译:在自动驾驶中,提前预测未来事件并评估可预见风险,能使自动驾驶车辆更好地规划其行为,从而提升道路安全与效率。为此,我们提出了Drive-WM——首个与现有端到端规划模型兼容的驾驶世界模型。通过视图分解实现的联合时空建模,我们的模型能够生成驾驶场景中高保真的多视角视频。基于其强大的生成能力,我们首次展示了将世界模型应用于安全驾驶规划的潜力。具体而言,Drive-WM能够根据不同的驾驶操作生成多种未来场景,并基于图像奖励确定最优轨迹。在真实驾驶数据集上的评估验证表明,本方法可生成高质量、一致且可控的多视角视频,为真实世界模拟与安全规划开辟了可能性。