基于失败的有效性学习：缓解自动驾驶规划中的分布偏移问题 (Validity Learning on Failures: Mitigating the Distribution Shift in Autonomous Vehicle Planning)

The planning problem constitutes a fundamental aspect of the autonomous driving framework. Recent strides in representation learning have empowered vehicles to comprehend their surrounding environments, thereby facilitating the integration of learning-based planning strategies. Among these approaches, Imitation Learning stands out due to its notable training efficiency. However, traditional Imitation Learning methodologies encounter challenges associated with the co-variate shift phenomenon. We propose Validity Learning on Failures, VL(on failure), as a remedy to address this issue. The essence of our method lies in deploying a pre-trained planner across diverse scenarios. Instances where the planner deviates from its immediate objectives, such as maintaining a safe distance from obstacles or adhering to traffic rules, are flagged as failures. The states corresponding to these failures are compiled into a new dataset, termed the failure dataset. Notably, the absence of expert annotations for this data precludes the applicability of standard imitation learning approaches. To facilitate learning from the closed-loop mistakes, we introduce the VL objective which aims to discern valid trajectories within the current environmental context. Experimental evaluations conducted on both reactive CARLA simulation and non-reactive log-replay simulations reveal substantial enhancements in closed-loop metrics such as \textit{Score, Progress}, and Success Rate, underscoring the effectiveness of the proposed methodology. Further evaluations against the Bench2Drive benchmark demonstrate that VL(on failure) outperforms the state-of-the-art methods by a large margin.

翻译：规划问题是自动驾驶框架中的一个基础性环节。近年来，表征学习领域的进展使得车辆能够理解其周围环境，从而促进了基于学习的规划策略的集成。在这些方法中，模仿学习因其显著的训练效率而备受关注。然而，传统的模仿学习方法面临着协变量偏移现象带来的挑战。我们提出基于失败的有效性学习（VL(on failure)）作为解决此问题的方法。我们方法的核心在于将预训练的规划器部署于多样化场景中。当规划器偏离其即时目标（例如与障碍物保持安全距离或遵守交通规则）时，这些实例被标记为失败。与这些失败对应的状态被收集到一个新的数据集中，称为失败数据集。值得注意的是，由于该数据缺乏专家标注，标准的模仿学习方法无法直接适用。为了促进从闭环错误中学习，我们引入了VL目标，其旨在识别当前环境背景下的有效轨迹。在反应式CARLA仿真和非反应式日志回放仿真上进行的实验评估表明，在闭环指标（如\textit{Score, Progress}和成功率）方面均有显著提升，这证明了所提方法的有效性。在Bench2Drive基准测试上的进一步评估显示，VL(on failure)大幅超越了现有最先进方法。