Current autonomous driving vehicles rely mainly on their individual sensors to understand surrounding scenes and plan for future trajectories, which can be unreliable when the sensors are malfunctioning or occluded. To address this problem, cooperative perception methods via vehicle-to-vehicle (V2V) communication have been proposed, but they have tended to focus on perception tasks like detection or tracking. How those approaches contribute to overall cooperative planning performance is still under-explored. Inspired by recent progress using Large Language Models (LLMs) to build autonomous driving systems, we propose a novel problem setting that integrates a Multi-Modal LLM into cooperative autonomous driving, with the proposed Vehicle-to-Vehicle Question-Answering (V2V-QA) dataset and benchmark. We also propose our baseline method Vehicle-to-Vehicle Multi-Modal Large Language Model (V2V-LLM), which uses an LLM to fuse perception information from multiple connected autonomous vehicles (CAVs) and answer various types of driving-related questions: grounding, notable object identification, and planning. Experimental results show that our proposed V2V-LLM can be a promising unified model architecture for performing various tasks in cooperative autonomous driving, and outperforms other baseline methods that use different fusion approaches. Our work also creates a new research direction that can improve the safety of future autonomous driving systems. The code and data will be released to the public to facilitate open-source research in this field. Our project website: https://eddyhkchiu.github.io/v2vllm.github.io/ .
翻译:当前的自动驾驶车辆主要依赖其自身的传感器来理解周围场景并规划未来轨迹,这在传感器发生故障或被遮挡时可能不可靠。为解决此问题,已提出通过车对车(V2V)通信的协同感知方法,但这些方法往往侧重于检测或跟踪等感知任务。这些方法如何提升整体协同规划性能仍有待探索。受近期利用大语言模型(LLMs)构建自动驾驶系统的进展启发,我们提出了一个新颖的问题设定,将多模态大语言模型整合到协同自动驾驶中,并构建了所提出的车对车问答(V2V-QA)数据集与基准。我们还提出了我们的基线方法——车对车多模态大语言模型(V2V-LLM),该方法利用LLM融合来自多辆联网自动驾驶车辆(CAVs)的感知信息,并回答多种类型的驾驶相关问题:定位、显著物体识别和规划。实验结果表明,我们提出的V2V-LLM可以成为在协同自动驾驶中执行各种任务的有前景的统一模型架构,并且优于采用不同融合方法的其他基线方法。我们的工作也开创了一个新的研究方向,有望提升未来自动驾驶系统的安全性。代码与数据将向公众发布,以促进该领域的开源研究。项目网站:https://eddyhkchiu.github.io/v2vllm.github.io/。