DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving

World models, especially in autonomous driving, are trending and drawing extensive attention due to their capacity for comprehending driving environments. The established world model holds immense potential for the generation of high-quality driving videos, and driving policies for safe maneuvering. However, a critical limitation in relevant research lies in its predominant focus on gaming environments or simulated settings, thereby lacking the representation of real-world driving scenarios. Therefore, we introduce DriveDreamer, a pioneering world model entirely derived from real-world driving scenarios. Regarding that modeling the world in intricate driving scenes entails an overwhelming search space, we propose harnessing the powerful diffusion model to construct a comprehensive representation of the complex environment. Furthermore, we introduce a two-stage training pipeline. In the initial phase, DriveDreamer acquires a deep understanding of structured traffic constraints, while the subsequent stage equips it with the ability to anticipate future states. The proposed DriveDreamer is the first world model established from real-world driving scenarios. We instantiate DriveDreamer on the challenging nuScenes benchmark, and extensive experiments verify that DriveDreamer empowers precise, controllable video generation that faithfully captures the structural constraints of real-world traffic scenarios. Additionally, DriveDreamer enables the generation of realistic and reasonable driving policies, opening avenues for interaction and practical applications.

翻译：世界模型，特别是在自动驾驶领域，因其理解驾驶环境的能力而备受关注并引发广泛研究。已建立的世界模型在生成高质量驾驶视频及安全驾驶策略方面具有巨大潜力。然而，相关研究的一个关键局限在于其主要聚焦于游戏环境或仿真场景，缺乏对真实世界驾驶情境的表征。为此，我们提出DriveDreamer，一种完全源自真实驾驶场景的开创性世界模型。鉴于复杂驾驶场景中世界建模面临巨大的搜索空间，我们提出利用强大的扩散模型构建复杂环境的综合表征。此外，我们引入两阶段训练流程：初始阶段使DriveDreamer深入理解结构化交通约束，后续阶段赋予其预测未来状态的能力。所提出的DriveDreamer是首个基于真实驾驶场景建立的世界模型。我们在具有挑战性的nuScenes基准上实例化DriveDreamer，大量实验验证了DriveDreamer能够实现精确、可控的视频生成，并忠实捕获真实交通场景的结构约束。同时，DriveDreamer能够生成真实且合理的驾驶策略，为交互与实际应用开辟了新途径。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日