V2V-LLM：基于多模态大语言模型的车对车协同自动驾驶 (V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models)

Current autonomous driving vehicles rely mainly on their individual sensors to understand surrounding scenes and plan for future trajectories, which can be unreliable when the sensors are malfunctioning or occluded. To address this problem, cooperative perception methods via vehicle-to-vehicle (V2V) communication have been proposed, but they have tended to focus on perception tasks like detection or tracking. How those approaches contribute to overall cooperative planning performance is still under-explored. Inspired by recent progress using Large Language Models (LLMs) to build autonomous driving systems, we propose a novel problem setting that integrates a Multi-Modal LLM into cooperative autonomous driving, with the proposed Vehicle-to-Vehicle Question-Answering (V2V-QA) dataset and benchmark. We also propose our baseline method Vehicle-to-Vehicle Multi-Modal Large Language Model (V2V-LLM), which uses an LLM to fuse perception information from multiple connected autonomous vehicles (CAVs) and answer various types of driving-related questions: grounding, notable object identification, and planning. Experimental results show that our proposed V2V-LLM can be a promising unified model architecture for performing various tasks in cooperative autonomous driving, and outperforms other baseline methods that use different fusion approaches. Our work also creates a new research direction that can improve the safety of future autonomous driving systems. The code and data will be released to the public to facilitate open-source research in this field. Our project website: https://eddyhkchiu.github.io/v2vllm.github.io/ .

翻译：当前的自动驾驶车辆主要依赖其自身的传感器来理解周围场景并规划未来轨迹，这在传感器发生故障或被遮挡时可能不可靠。为解决此问题，已提出通过车对车（V2V）通信的协同感知方法，但这些方法往往侧重于检测或跟踪等感知任务。这些方法如何提升整体协同规划性能仍有待探索。受近期利用大语言模型（LLMs）构建自动驾驶系统的进展启发，我们提出了一个新颖的问题设定，将多模态大语言模型整合到协同自动驾驶中，并构建了所提出的车对车问答（V2V-QA）数据集与基准。我们还提出了我们的基线方法——车对车多模态大语言模型（V2V-LLM），该方法利用LLM融合来自多辆联网自动驾驶车辆（CAVs）的感知信息，并回答多种类型的驾驶相关问题：定位、显著物体识别和规划。实验结果表明，我们提出的V2V-LLM可以成为在协同自动驾驶中执行各种任务的有前景的统一模型架构，并且优于采用不同融合方法的其他基线方法。我们的工作也开创了一个新的研究方向，有望提升未来自动驾驶系统的安全性。代码与数据将向公众发布，以促进该领域的开源研究。项目网站：https://eddyhkchiu.github.io/v2vllm.github.io/。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日