There is growing interest in leveraging the capabilities of robust Multi-Modal Large Language Models (MLLMs) directly within autonomous driving contexts. However, the high costs and complexity of designing and training end-to-end autonomous driving models make them challenging for many enterprises and research entities. To address this, our study explores a seamless integration of MLLMs into autonomous driving systems by proposing a Zero-Shot Chain-of-Thought (Zero-Shot-CoT) prompt design named PKRD-CoT. PKRD-CoT is based on the four fundamental capabilities of autonomous driving: perception, knowledge, reasoning, and decision-making. This makes it particularly suitable for understanding and responding to dynamic driving environments by mimicking human thought processes step by step, thus enhancing decision-making in real-time scenarios. Our design enables MLLMs to tackle problems without prior experience, thereby increasing their utility within unstructured autonomous driving environments. In experiments, we demonstrate the exceptional performance of GPT-4.0 with PKRD-CoT across autonomous driving tasks, highlighting its effectiveness in autonomous driving scenarios. Additionally, our benchmark analysis reveals the promising viability of PKRD-CoT for other MLLMs, such as Claude, LLava1.6, and Qwen-VL-Plus. Overall, this study contributes a novel and unified prompt-design framework for GPT-4.0 and other MLLMs in autonomous driving, while also rigorously evaluating the efficacy of these widely recognized MLLMs in the autonomous driving domain through comprehensive comparisons.
翻译:当前,直接在自动驾驶场景中利用强大多模态大语言模型(MLLMs)的能力正受到日益增长的关注。然而,设计和训练端到端自动驾驶模型的高成本与复杂性,使其对许多企业和研究机构构成挑战。为解决这一问题,本研究通过提出一种名为 PKRD-CoT 的零样本思维链(Zero-Shot-CoT)提示设计,探索了将 MLLMs 无缝集成到自动驾驶系统中的方法。PKRD-CoT 基于自动驾驶的四个基本能力:感知、知识、推理与决策。这使得它特别适合通过逐步模拟人类思维过程来理解和响应动态驾驶环境,从而增强实时场景下的决策能力。我们的设计使 MLLMs 能够在没有先验经验的情况下处理问题,从而提升其在非结构化自动驾驶环境中的实用性。实验中,我们展示了搭载 PKRD-CoT 的 GPT-4.0 在自动驾驶任务中的卓越性能,突显了其在自动驾驶场景中的有效性。此外,我们的基准测试分析表明,PKRD-CoT 对于其他 MLLMs(如 Claude、LLava1.6 和 Qwen-VL-Plus)也展现出良好的适用潜力。总体而言,本研究为自动驾驶领域的 GPT-4.0 及其他 MLLMs 贡献了一个新颖且统一的提示设计框架,同时通过全面比较,严谨评估了这些广受认可的 MLLMs 在自动驾驶领域的效能。