V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multimodal Large Language Models

from arxiv, Accepted by ICRA 2026 (IEEE International Conference on Robotics and Automation). Project: https://eddyhkchiu.github.io/v2vllm.github.io/ Code: https://github.com/eddyhkchiu/V2V-LLM Dataset: https://huggingface.co/datasets/eddyhkchiu/V2V-GoT-QA

Current autonomous driving vehicles rely mainly on their individual sensors to understand surrounding scenes and plan for future trajectories, which can be unreliable when the sensors are malfunctioning or occluded. To address this problem, cooperative perception methods via vehicle-to-vehicle (V2V) communication have been proposed, but they have tended to focus on perception tasks like detection or tracking. How those approaches contribute to overall cooperative planning performance is still under-explored. Inspired by recent progress using Large Language Models (LLMs) to build autonomous driving systems, we propose a novel problem setting that integrates a Multimodal LLM into cooperative autonomous driving, with the proposed Vehicle-to-Vehicle Question-Answering (V2V-QA) dataset and benchmark. We also propose our baseline method Vehicle-to-Vehicle Multimodal Large Language Model (V2V-LLM), which uses an LLM to fuse perception information from multiple connected autonomous vehicles (CAVs) and answer various types of driving-related questions: grounding, notable object identification, and planning. Experimental results show that our proposed V2V-LLM can be a promising unified model architecture for performing various tasks in cooperative autonomous driving, and outperforms other baseline methods that use different fusion approaches. Our work also creates a new research direction that can improve the safety of future autonomous driving systems. The code and data will be released to the public to facilitate open-source research in this field. Our project website: https://eddyhkchiu.github.io/v2vllm.github.io/ .

翻译：当前自动驾驶车辆主要依赖自身传感器来理解周围场景并规划未来轨迹，这在传感器故障或被遮挡时可能不可靠。为解决此问题，已提出通过车对车（V2V）通信的协同感知方法，但这些方法往往侧重于检测或跟踪等感知任务。这些方法如何提升整体协同规划性能仍有待探索。受近期利用大语言模型（LLMs）构建自动驾驶系统的进展启发，我们提出了一个新颖的问题设定，将多模态大语言模型整合到协同自动驾驶中，并构建了提出的车对车问答（V2V-QA）数据集与基准。我们还提出了基线方法——车对车多模态大语言模型（V2V-LLM），该方法利用大语言模型融合来自多辆联网自动驾驶车辆（CAVs）的感知信息，并回答多种类型的驾驶相关问题：定位、显著物体识别和规划。实验结果表明，我们提出的V2V-LLM有望成为执行协同自动驾驶中各种任务的统一模型架构，且性能优于采用不同融合方法的其他基线方法。我们的工作也开创了一个新的研究方向，有望提升未来自动驾驶系统的安全性。代码与数据将公开发布，以促进该领域的开源研究。项目网站：https://eddyhkchiu.github.io/v2vllm.github.io/。