Achieving reliable and efficient planning in complex driving environments requires a model that can reason over the scene's geometry, appearance, and dynamics. We present UniDWM, a unified driving world model that advances autonomous driving through multifaceted representation learning. UniDWM constructs a structure- and dynamic-aware latent world representation that serves as a physically grounded state space, enabling consistent reasoning across perception, prediction, and planning. Specifically, a joint reconstruction pathway learns to recover the scene's structure, including geometry and visual texture, while a collaborative generation framework leverages a conditional diffusion transformer to forecast future world evolution within the latent space. Furthermore, we show that our UniDWM can be deemed as a variation of VAE, which provides theoretical guidance for the multifaceted representation learning. Extensive experiments demonstrate the effectiveness of UniDWM in trajectory planning, 4D reconstruction and generation, highlighting the potential of multifaceted world representations as a foundation for unified driving intelligence. The code will be publicly available at https://github.com/Say2L/UniDWM.
翻译:在复杂驾驶环境中实现可靠且高效的规划,需要一个能够对场景的几何、外观和动态进行推理的模型。我们提出了UniDWM,一个通过多视角表征学习推进自动驾驶的统一驾驶世界模型。UniDWM构建了一个结构感知与动态感知的潜在世界表征,该表征作为一个物理基础的状态空间,实现了感知、预测和规划之间的一致推理。具体而言,一个联合重建通路学习恢复场景的结构(包括几何和视觉纹理),而一个协同生成框架则利用条件扩散Transformer在潜在空间内预测未来的世界演化。此外,我们证明了我们的UniDWM可以被视为VAE的一种变体,这为多视角表征学习提供了理论指导。大量实验证明了UniDWM在轨迹规划、4D重建与生成方面的有效性,突显了多视角世界表征作为统一驾驶智能基础的潜力。代码将在 https://github.com/Say2L/UniDWM 公开。