HERMES：用于同步三维场景理解与生成的统一自动驾驶世界模型 (HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation)

Driving World Models (DWMs) have become essential for autonomous driving by enabling future scene prediction. However, existing DWMs are limited to scene generation and fail to incorporate scene understanding, which involves interpreting and reasoning about the driving environment. In this paper, we present a unified Driving World Model named HERMES. We seamlessly integrate 3D scene understanding and future scene evolution (generation) through a unified framework in driving scenarios. Specifically, HERMES leverages a Bird's-Eye View (BEV) representation to consolidate multi-view spatial information while preserving geometric relationships and interactions. We also introduce world queries, which incorporate world knowledge into BEV features via causal attention in the Large Language Model, enabling contextual enrichment for understanding and generation tasks. We conduct comprehensive studies on nuScenes and OmniDrive-nuScenes datasets to validate the effectiveness of our method. HERMES achieves state-of-the-art performance, reducing generation error by 32.4% and improving understanding metrics such as CIDEr by 8.0%. The model and code will be publicly released at https://github.com/LMD0311/HERMES.

翻译：驾驶世界模型通过实现未来场景预测，已成为自动驾驶领域的关键技术。然而，现有驾驶世界模型仅限于场景生成，未能整合涉及驾驶环境解释与推理的场景理解能力。本文提出了一种名为HERMES的统一驾驶世界模型。我们在驾驶场景中通过统一框架，无缝集成了三维场景理解与未来场景演化（生成）功能。具体而言，HERMES利用鸟瞰图表示整合多视角空间信息，同时保持几何关系与交互特性。我们还引入了世界查询机制，通过大型语言模型中的因果注意力将世界知识融入鸟瞰图特征，从而为理解与生成任务提供上下文增强。我们在nuScenes和OmniDrive-nuScenes数据集上进行了全面实验，验证了方法的有效性。HERMES取得了最先进的性能，生成误差降低32.4%，理解指标（如CIDEr）提升8.0%。模型与代码将在https://github.com/LMD0311/HERMES公开发布。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日