重新思考驾驶世界模型作为感知任务的合成数据生成器 (Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks)

Kai Zeng,Zhanqian Wu,Kaixin Xiong,Xiaobao Wei,Xiangyu Guo,Zhenxin Zhu,Kalok Ho,Lijun Zhou,Bohan Zeng,Ming Lu,Haiyang Sun,Bing Wang,Guang Chen,Hangjun Ye,Wentao Zhang

Recent advancements in driving world models enable controllable generation of high-quality RGB videos or multimodal videos. Existing methods primarily focus on metrics related to generation quality and controllability. However, they often overlook the evaluation of downstream perception tasks, which are $\mathbf{really\ crucial}$ for the performance of autonomous driving. Existing methods usually leverage a training strategy that first pretrains on synthetic data and finetunes on real data, resulting in twice the epochs compared to the baseline (real data only). When we double the epochs in the baseline, the benefit of synthetic data becomes negligible. To thoroughly demonstrate the benefit of synthetic data, we introduce Dream4Drive, a novel synthetic data generation framework designed for enhancing the downstream perception tasks. Dream4Drive first decomposes the input video into several 3D-aware guidance maps and subsequently renders the 3D assets onto these guidance maps. Finally, the driving world model is fine-tuned to produce the edited, multi-view photorealistic videos, which can be used to train the downstream perception models. Dream4Drive enables unprecedented flexibility in generating multi-view corner cases at scale, significantly boosting corner case perception in autonomous driving. To facilitate future research, we also contribute a large-scale 3D asset dataset named DriveObj3D, covering the typical categories in driving scenarios and enabling diverse 3D-aware video editing. We conduct comprehensive experiments to show that Dream4Drive can effectively boost the performance of downstream perception models under various training epochs. Page: https://wm-research.github.io/Dream4Drive/ GitHub Link: https://github.com/wm-research/Dream4Drive

翻译：近期驾驶世界模型的进展实现了高质量RGB视频或多模态视频的可控生成。现有方法主要关注与生成质量和可控性相关的指标，但往往忽视了对下游感知任务的评估，而这对于自动驾驶的性能至关重要。现有方法通常采用先在合成数据上预训练、再在真实数据上微调的训练策略，导致训练周期数达到基准方法（仅使用真实数据）的两倍。当我们将基准方法的训练周期数加倍时，合成数据的优势变得微乎其微。为充分证明合成数据的价值，我们提出了Dream4Drive——一个专为增强下游感知任务设计的新型合成数据生成框架。Dream4Drive首先将输入视频分解为多个三维感知引导图，随后将三维资产渲染至这些引导图上。最后，通过微调驾驶世界模型生成经过编辑的多视角逼真视频，这些视频可用于训练下游感知模型。Dream4Drive实现了大规模生成多视角极端场景的突破性灵活性，显著提升了自动驾驶中的极端场景感知能力。为促进未来研究，我们还贡献了名为DriveObj3D的大规模三维资产数据集，涵盖驾驶场景中的典型类别，支持多样化的三维感知视频编辑。通过全面实验表明，Dream4Drive能在不同训练周期数下有效提升下游感知模型的性能。项目主页：https://wm-research.github.io/Dream4Drive/ GitHub链接：https://github.com/wm-research/Dream4Drive

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日