The planning problem constitutes a fundamental aspect of the autonomous driving framework. Recent strides in representation learning have empowered vehicles to comprehend their surrounding environments, thereby facilitating the integration of learning-based planning strategies. Among these approaches, Imitation Learning stands out due to its notable training efficiency. However, traditional Imitation Learning methodologies encounter challenges associated with the co-variate shift phenomenon. We propose Learn from Mistakes (LfM) as a remedy to address this issue. The essence of LfM lies in deploying a pre-trained planner across diverse scenarios. Instances where the planner deviates from its immediate objectives, such as maintaining a safe distance from obstacles or adhering to traffic rules, are flagged as mistakes. The environments corresponding to these mistakes are categorized as out-of-distribution states and compiled into a new dataset termed closed-loop mistakes dataset. Notably, the absence of expert annotations for the closed-loop data precludes the applicability of standard imitation learning approaches. To facilitate learning from the closed-loop mistakes, we introduce Validity Learning, a weakly supervised method, which aims to discern valid trajectories within the current environmental context. Experimental evaluations conducted on the InD and Nuplan datasets reveal substantial enhancements in closed-loop metrics such as Progress and Collision Rate, underscoring the effectiveness of the proposed methodology.
翻译:规划问题是自动驾驶框架的一个基本方面。近年来,表征学习领域的进展使车辆能够理解其周围环境,从而促进了基于学习的规划策略的集成。在这些方法中,模仿学习因其显著的训练效率而脱颖而出。然而,传统的模仿学习方法遇到了与协变量偏移现象相关的挑战。我们提出"从错误中学习"(LfM)作为解决此问题的方法。LfM的核心在于在各种场景中部署一个预训练的规划器。当规划器偏离其即时目标时,例如与障碍物保持安全距离或遵守交通规则,这些实例被标记为错误。与这些错误相对应的环境被归类为分布外状态,并被收集到一个称为闭环错误数据集的新数据集中。值得注意的是,由于闭环数据缺乏专家标注,标准的模仿学习方法无法适用。为了促进从闭环错误中学习,我们引入了有效性学习,这是一种弱监督方法,旨在识别当前环境背景下的有效轨迹。在InD和Nuplan数据集上进行的实验评估显示,在诸如进度和碰撞率等闭环指标上取得了显著提升,这证明了所提方法的有效性。