Toward Physically Consistent Driving Video World Models under Challenging Trajectories

Video generation models have shown strong potential as world models for autonomous driving simulation. However, existing approaches are primarily trained on real-world driving datasets, which mostly contain natural and safe driving scenarios. As a result, current models often fail when conditioned on challenging or counterfactual trajectories-such as imperfect trajectories generated by simulators or planning systems-producing videos with severe physical inconsistencies and artifacts. To address this limitation, we propose PhyGenesis, a world model designed to generate driving videos with high visual fidelity and strong physical consistency. Our framework consists of two key components: (1) a physical condition generator that transforms potentially invalid trajectory inputs into physically plausible conditions, and (2) a physics-enhanced video generator that produces high-fidelity multi-view driving videos under these conditions. To effectively train these components, we construct a large-scale, physics-rich heterogeneous dataset. Specifically, in addition to real-world driving videos, we generate diverse challenging driving scenarios using the CARLA simulator, from which we derive supervision signals that guide the model to learn physically grounded dynamics under extreme conditions. This challenging-trajectory learning strategy enables trajectory correction and promotes physically consistent video generation. Extensive experiments demonstrate that PhyGenesis consistently outperforms state-of-the-art methods, especially on challenging trajectories. Our project page is available at: https://wm-research.github.io/PhyGenesis/.

翻译：视频生成模型在作为自动驾驶仿真的世界模型方面展现出巨大潜力。然而，现有方法主要基于真实世界驾驶数据集训练，这些数据集大多包含自然且安全的驾驶场景。因此，当前模型在处理挑战性或反事实轨迹（例如由仿真器或规划系统生成的不完美轨迹）时常常失效，生成的视频存在严重的物理不一致性和伪影。为克服此局限，我们提出PhyGenesis——一种旨在生成高视觉保真度与强物理一致性的驾驶视频世界模型。该框架包含两个关键组件：（1）物理条件生成器，可将潜在无效轨迹输入转化为物理合理的条件；（2）物理增强型视频生成器，可在这些条件下生成高保真多视角驾驶视频。为有效训练这些组件，我们构建了一个大规模、富含物理信息的异质数据集。具体而言，除真实驾驶视频外，我们利用CARLA仿真器生成多样化的挑战性驾驶场景，并从中提取监督信号，引导模型在极端条件下学习基于物理规律的动力学。这种挑战性轨迹学习策略可实现轨迹校正并促进物理一致的视频生成。大量实验表明，PhyGenesis在各方面均持续优于现有最先进方法，尤其在处理挑战性轨迹时表现突出。项目页面：https://wm-research.github.io/PhyGenesis/。