物理信息驱动的世界模型 (Pysical Informed Driving World Model)

Autonomous driving requires robust perception models trained on high-quality, large-scale multi-view driving videos for tasks like 3D object detection, segmentation and trajectory prediction. While world models provide a cost-effective solution for generating realistic driving videos, challenges remain in ensuring these videos adhere to fundamental physical principles, such as relative and absolute motion, spatial relationship like occlusion and spatial consistency, and temporal consistency. To address these, we propose DrivePhysica, an innovative model designed to generate realistic multi-view driving videos that accurately adhere to essential physical principles through three key advancements: (1) a Coordinate System Aligner module that integrates relative and absolute motion features to enhance motion interpretation, (2) an Instance Flow Guidance module that ensures precise temporal consistency via efficient 3D flow extraction, and (3) a Box Coordinate Guidance module that improves spatial relationship understanding and accurately resolves occlusion hierarchies. Grounded in physical principles, we achieve state-of-the-art performance in driving video generation quality (3.96 FID and 38.06 FVD on the Nuscenes dataset) and downstream perception tasks. Our project homepage: https://metadrivescape.github.io/papers_project/DrivePhysica/page.html

翻译：自动驾驶需要基于高质量、大规模多视角驾驶视频训练的鲁棒感知模型，以完成三维目标检测、分割和轨迹预测等任务。虽然世界模型为生成逼真驾驶视频提供了一种经济高效的解决方案，但如何确保这些视频遵循基本物理原理（如相对与绝对运动、遮挡等空间关系及空间一致性、时间一致性）仍面临挑战。为此，我们提出DrivePhysica模型，该创新模型通过三项关键改进生成逼真且严格遵循核心物理原理的多视角驾驶视频：（1）坐标系对齐模块，整合相对与绝对运动特征以增强运动解析能力；（2）实例流引导模块，通过高效三维流提取确保精确的时间一致性；（3）边界框坐标引导模块，提升空间关系理解能力并准确解析遮挡层级。基于物理原理的建模使我们在驾驶视频生成质量（在Nuscenes数据集上达到3.96 FID与38.06 FVD）与下游感知任务中取得了最先进的性能。项目主页：https://metadrivescape.github.io/papers_project/DrivePhysica/page.html

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日