PKRD-CoT: A Unified Chain-of-thought Prompting for Multi-Modal Large Language Models in Autonomous Driving

There is growing interest in leveraging the capabilities of robust Multi-Modal Large Language Models (MLLMs) directly within autonomous driving contexts. However, the high costs and complexity of designing and training end-to-end autonomous driving models make them challenging for many enterprises and research entities. To address this, our study explores a seamless integration of MLLMs into autonomous driving systems by proposing a Zero-Shot Chain-of-Thought (Zero-Shot-CoT) prompt design named PKRD-CoT. PKRD-CoT is based on the four fundamental capabilities of autonomous driving: perception, knowledge, reasoning, and decision-making. This makes it particularly suitable for understanding and responding to dynamic driving environments by mimicking human thought processes step by step, thus enhancing decision-making in real-time scenarios. Our design enables MLLMs to tackle problems without prior experience, thereby increasing their utility within unstructured autonomous driving environments. In experiments, we demonstrate the exceptional performance of GPT-4.0 with PKRD-CoT across autonomous driving tasks, highlighting its effectiveness in autonomous driving scenarios. Additionally, our benchmark analysis reveals the promising viability of PKRD-CoT for other MLLMs, such as Claude, LLava1.6, and Qwen-VL-Plus. Overall, this study contributes a novel and unified prompt-design framework for GPT-4.0 and other MLLMs in autonomous driving, while also rigorously evaluating the efficacy of these widely recognized MLLMs in the autonomous driving domain through comprehensive comparisons.

翻译：当前，直接在自动驾驶场景中利用强大多模态大语言模型（MLLMs）的能力正受到日益增长的关注。然而，设计和训练端到端自动驾驶模型的高成本与复杂性，使其对许多企业和研究机构构成挑战。为解决这一问题，本研究通过提出一种名为 PKRD-CoT 的零样本思维链（Zero-Shot-CoT）提示设计，探索了将 MLLMs 无缝集成到自动驾驶系统中的方法。PKRD-CoT 基于自动驾驶的四个基本能力：感知、知识、推理与决策。这使得它特别适合通过逐步模拟人类思维过程来理解和响应动态驾驶环境，从而增强实时场景下的决策能力。我们的设计使 MLLMs 能够在没有先验经验的情况下处理问题，从而提升其在非结构化自动驾驶环境中的实用性。实验中，我们展示了搭载 PKRD-CoT 的 GPT-4.0 在自动驾驶任务中的卓越性能，突显了其在自动驾驶场景中的有效性。此外，我们的基准测试分析表明，PKRD-CoT 对于其他 MLLMs（如 Claude、LLava1.6 和 Qwen-VL-Plus）也展现出良好的适用潜力。总体而言，本研究为自动驾驶领域的 GPT-4.0 及其他 MLLMs 贡献了一个新颖且统一的提示设计框架，同时通过全面比较，严谨评估了这些广受认可的 MLLMs 在自动驾驶领域的效能。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日