Human-level driving is an ultimate goal of autonomous driving. Conventional approaches formulate autonomous driving as a perception-prediction-planning framework, yet their systems do not capitalize on the inherent reasoning ability and experiential knowledge of humans. In this paper, we propose a fundamental paradigm shift from current pipelines, exploiting Large Language Models (LLMs) as a cognitive agent to integrate human-like intelligence into autonomous driving systems. Our approach, termed Agent-Driver, transforms the traditional autonomous driving pipeline by introducing a versatile tool library accessible via function calls, a cognitive memory of common sense and experiential knowledge for decision-making, and a reasoning engine capable of chain-of-thought reasoning, task planning, motion planning, and self-reflection. Powered by LLMs, our Agent-Driver is endowed with intuitive common sense and robust reasoning capabilities, thus enabling a more nuanced, human-like approach to autonomous driving. We evaluate our approach on the large-scale nuScenes benchmark, and extensive experiments substantiate that our Agent-Driver significantly outperforms the state-of-the-art driving methods by a large margin. Our approach also demonstrates superior interpretability and few-shot learning ability to these methods. Code will be released.
翻译:人类级别的驾驶是自动驾驶的终极目标。传统方法将自动驾驶构建为感知-预测-规划框架,但其系统并未充分利用人类固有的推理能力和经验知识。本文提出一种颠覆现有流程的根本性范式转变:利用大型语言模型作为认知智能体,将人类智能整合到自动驾驶系统中。我们提出的方法名为Agent-Driver,通过引入可通过函数调用访问的多功能工具库、用于决策的常识与经验知识认知记忆,以及能够进行思维链推理、任务规划、运动规划和自我反思的推理引擎,彻底改造了传统的自动驾驶流程。在大型语言模型的驱动下,我们的Agent-Driver具备直观的常识和强大的推理能力,从而实现对自动驾驶更细腻、更接近人类的方式。我们在大规模nuScenes基准上评估了该方法,大量实验表明,我们的Agent-Driver以显著优势超越了最先进的驾驶方法。同时,该方法相较这些方法展现出更强的可解释性和少样本学习能力。代码将公开发布。