Most Human-Machine Interaction (HMI) research overlooks the maneuvering needs of passengers in autonomous driving (AD). Natural language offers an intuitive interface, yet translating passenger open-ended instructions into control signals, without sacrificing interpretability and traceability, remains a challenge. This study proposes an instruction-realization framework that leverages a large language model (LLM) to interpret instructions, generates executable scripts that schedule multiple model predictive control (MPC)-based motion planners based on real-time feedback, and converts planned trajectories into control signals. This scheduling-centric design decouples semantic reasoning from vehicle control at different timescales, establishing a transparent, traceable decision-making chain from high-level instructions to low-level actions. Due to the absence of high-fidelity evaluation tools, this study introduces a benchmark for open-ended instruction realization in a closed-loop setting. Comprehensive experiments reveal that the framework significantly improves task-completion rates over instruction-realization baselines, reduces LLM query costs, achieves safety and compliance on par with specialized AD approaches, and exhibits considerable tolerance to LLM inference latency. For more qualitative illustrations and a clearer understanding.
翻译:大多数人机交互(HMI)研究忽视了自动驾驶中乘客的操控需求。自然语言提供了直观的交互界面,但在不牺牲可解释性和可追溯性的前提下,将乘客的开放式指令转化为控制信号仍是一大挑战。本研究提出一种指令实现框架,利用大语言模型(LLM)解读指令,生成可执行脚本,基于实时反馈调度多个基于模型预测控制(MPC)的运动规划器,并将规划轨迹转化为控制信号。这种以调度为中心的设计将语义推理与车辆控制在不同时间尺度上解耦,建立了从高层指令到低层动作的透明、可追溯的决策链。鉴于缺乏高保真评估工具,本研究在闭环场景下引入了开放式指令实现的基准测试。综合实验表明,该框架相较于现有指令实现基线显著提升了任务完成率,降低了LLM查询成本,在安全性与合规性方面达到与专业自动驾驶方法相当的水平,并对LLM推理延迟表现出较强的鲁棒性。