With advances in imitation learning (IL) and large-scale driving datasets, end-to-end autonomous driving (E2E-AD) has made great progress recently. Currently, IL-based methods have become a mainstream paradigm: models rely on standard driving behaviors given by experts, and learn to minimize the discrepancy between their actions and expert actions. However, this objective of "only driving like the expert" suffers from limited generalization: when encountering rare or unseen long-tail scenarios outside the distribution of expert demonstrations, models tend to produce unsafe decisions in the absence of prior experience. This raises a fundamental question: Can an E2E-AD system make reliable decisions without any expert action supervision? Motivated by this, we propose a unified framework named Risk-aware World Model Predictive Control (RaWMPC) to address this generalization dilemma through robust control, without reliance on expert demonstrations. Practically, RaWMPC leverages a world model to predict the consequences of multiple candidate actions and selects low-risk actions through explicit risk evaluation. To endow the world model with the ability to predict the outcomes of risky driving behaviors, we design a risk-aware interaction strategy that systematically exposes the world model to hazardous behaviors, making catastrophic outcomes predictable and thus avoidable. Furthermore, to generate low-risk candidate actions at test time, we introduce a self-evaluation distillation method to distill riskavoidance capabilities from the well-trained world model into a generative action proposal network without any expert demonstration. Extensive experiments show that RaWMPC outperforms state-of-the-art methods in both in-distribution and out-of-distribution scenarios, while providing superior decision interpretability.
翻译:随着模仿学习(IL)和大规模驾驶数据集的进步,端到端自动驾驶(E2E-AD)近年来取得了巨大进展。目前,基于IL的方法已成为主流范式:模型依赖于专家提供的标准驾驶行为,并学习最小化其动作与专家动作之间的差异。然而,这种“仅像专家一样驾驶”的目标存在泛化能力有限的问题:当遇到专家演示分布之外的罕见或未见长尾场景时,模型在缺乏先验经验的情况下倾向于做出不安全的决策。这引发了一个根本性问题:E2E-AD系统能否在没有任何专家动作监督的情况下做出可靠决策?受此启发,我们提出了一个名为风险感知世界模型预测控制(RaWMPC)的统一框架,通过鲁棒控制来解决这一泛化困境,而无需依赖专家演示。具体而言,RaWMPC利用世界模型预测多个候选动作的后果,并通过显式风险评估选择低风险动作。为了使世界模型具备预测危险驾驶行为结果的能力,我们设计了一种风险感知交互策略,系统性地将世界模型暴露于危险行为之下,使得灾难性后果可预测从而可避免。此外,为了在测试时生成低风险候选动作,我们引入了一种自评估蒸馏方法,将训练良好的世界模型中的风险规避能力蒸馏到生成式动作提议网络中,整个过程无需任何专家演示。大量实验表明,RaWMPC在分布内和分布外场景中均优于现有最先进方法,同时提供了更优的决策可解释性。