Current state-of-the-art autonomous vehicles could face safety-critical situations when their local sensors are occluded by large nearby objects on the road. Vehicle-to-vehicle (V2V) cooperative autonomous driving has been proposed as a means of addressing this problem, and one recently introduced framework for cooperative autonomous driving has further adopted an approach that incorporates a Multimodal Large Language Model (MLLM) to integrate cooperative perception and planning processes. However, despite the potential benefit of applying graph-of-thoughts reasoning to the MLLM, this idea has not been considered by previous cooperative autonomous driving research. In this paper, we propose a novel graph-of-thoughts framework specifically designed for MLLM-based cooperative autonomous driving. Our graph-of-thoughts includes our proposed novel ideas of occlusion-aware perception and planning-aware prediction. We curate the V2V-GoT-QA dataset and develop the V2V-GoT model for training and testing the cooperative driving graph-of-thoughts. Our experimental results show that our method outperforms other baselines in cooperative perception, prediction, and planning tasks. Our project website: https://eddyhkchiu.github.io/v2vgot.github.io/ .
翻译:当前最先进的自动驾驶车辆在道路上的大型邻近物体遮挡其局部传感器时,可能面临安全关键情况。车对车协同自动驾驶已被提出作为解决此问题的一种手段,近期引入的一个协同自动驾驶框架进一步采用了融合多模态大语言模型的方法,以整合协同感知与规划过程。然而,尽管将思维图推理应用于该多模态大语言模型具有潜在优势,但此前的协同自动驾驶研究尚未考虑这一思路。本文提出了一种专为基于多模态大语言模型的协同自动驾驶设计的新型思维图框架。我们的思维图包含了我们提出的遮挡感知感知与规划感知预测的新颖思路。我们构建了V2V-GoT-QA数据集,并开发了V2V-GoT模型用于训练和测试协同驾驶思维图。实验结果表明,我们的方法在协同感知、预测和规划任务上均优于其他基线方法。项目网站:https://eddyhkchiu.github.io/v2vgot.github.io/。