Vehicle motion planning is an essential component of autonomous driving technology. Current rule-based vehicle motion planning methods perform satisfactorily in common scenarios but struggle to generalize to long-tailed situations. Meanwhile, learning-based methods have yet to achieve superior performance over rule-based approaches in large-scale closed-loop scenarios. To address these issues, we propose PlanAgent, the first mid-to-mid planning system based on a Multi-modal Large Language Model (MLLM). MLLM is used as a cognitive agent to introduce human-like knowledge, interpretability, and common-sense reasoning into the closed-loop planning. Specifically, PlanAgent leverages the power of MLLM through three core modules. First, an Environment Transformation module constructs a Bird's Eye View (BEV) map and a lane-graph-based textual description from the environment as inputs. Second, a Reasoning Engine module introduces a hierarchical chain-of-thought from scene understanding to lateral and longitudinal motion instructions, culminating in planner code generation. Last, a Reflection module is integrated to simulate and evaluate the generated planner for reducing MLLM's uncertainty. PlanAgent is endowed with the common-sense reasoning and generalization capability of MLLM, which empowers it to effectively tackle both common and complex long-tailed scenarios. Our proposed PlanAgent is evaluated on the large-scale and challenging nuPlan benchmarks. A comprehensive set of experiments convincingly demonstrates that PlanAgent outperforms the existing state-of-the-art in the closed-loop motion planning task. Codes will be soon released.
翻译:车辆运动规划是自动驾驶技术的关键组成部分。当前基于规则的车辆运动规划方法在常见场景中表现良好,但难以泛化至长尾场景。与此同时,基于学习的方法在大规模闭环场景中尚未取得超越基于规则方法的性能。为解决这些问题,我们提出了PlanAgent——首个基于多模态大语言模型(MLLM)的中端到中端规划系统。该系统将MLLM作为认知智能体,为闭环规划引入类人知识、可解释性与常识推理能力。具体而言,PlanAgent通过三个核心模块发挥MLLM的潜力:首先,环境转换模块构建鸟瞰图(BEV)地图和基于车道图的文本描述作为输入;其次,推理引擎模块引入从场景理解到横向/纵向运动指令的分层思维链,最终生成规划器代码;最后,集成反思模块对生成的规划器进行模拟评估,以降低MLLM的不确定性。PlanAgent继承了MLLM的常识推理与泛化能力,使其能有效处理常见场景与复杂长尾场景。我们在大规模高难度nuPlan基准测试中对PlanAgent进行评估。综合实验结果表明,PlanAgent在闭环运动规划任务中显著优于现有最先进方法。代码即将开源。