Recent advancements in Language Models (LMs) have demonstrated strong semantic reasoning capabilities, enabling their application in high-level decision-making for autonomous driving (AD). However, LMs operate over discrete token spaces and lack the ability to generate continuous, physically feasible trajectories required for motion planning. Meanwhile, diffusion models have proven effective at generating reliable and dynamically consistent trajectories, but often lack semantic interpretability and alignment with scene-level understanding. To address these limitations, we propose \textbf{KnowDiffuser}, a knowledge-guided motion planning framework that tightly integrates the semantic understanding of language models with the generative power of diffusion models. The framework employs a language model to infer context-aware meta-actions from structured scene representations, which are then mapped to prior trajectories that anchor the subsequent denoising process. A two-stage truncated denoising mechanism refines these trajectories efficiently, preserving both semantic alignment and physical feasibility. Experiments on the nuPlan benchmark demonstrate that KnowDiffuser significantly outperforms existing planners in both open-loop and closed-loop evaluations, establishing a robust and interpretable framework that effectively bridges the semantic-to-physical gap in AD systems.
翻译:近年来,语言模型(LMs)的进展展示了强大的语义推理能力,使其能够应用于自动驾驶(AD)的高层级决策。然而,语言模型在离散词元空间上运行,缺乏生成运动规划所需的连续、物理可行轨迹的能力。与此同时,扩散模型已被证明能有效生成可靠且动态一致的轨迹,但往往缺乏语义可解释性以及与场景级理解的对齐。为解决这些局限,我们提出**KnowDiffuser**,一种知识引导的运动规划框架,该框架紧密融合了语言模型的语义理解能力与扩散模型的生成能力。该框架利用语言模型从结构化场景表示中推断出上下文感知的元动作,随后将其映射为引导后续去噪过程的先验轨迹。一种两阶段截断去噪机制高效地优化这些轨迹,同时保持了语义对齐和物理可行性。在nuPlan基准上的实验表明,KnowDiffuser在开环和闭环评估中均显著优于现有规划器,为AD系统建立了一个鲁棒且可解释的框架,有效弥合了语义到物理之间的鸿沟。