In recent years, imitation learning (IL) has been widely used in industry as the core of autonomous vehicle (AV) planning modules. However, previous IL works show sample inefficiency and low generalisation in safety-critical scenarios, on which they are rarely tested. As a result, IL planners can reach a performance plateau where adding more training data ceases to improve the learnt policy. First, our work presents an IL model using the spline coefficient parameterisation and offline expert queries to enhance safety and training efficiency. Then, we expose the weakness of the learnt IL policy by synthetically generating critical scenarios through optimisation of parameters of the driver's risk field (DRF), a parametric human driving behaviour model implemented in a multi-agent traffic simulator based on the Lyft Prediction Dataset. To continuously improve the learnt policy, we retrain the IL model with augmented data. Thanks to the expressivity and interpretability of the DRF, the desired driving behaviours can be encoded and aggregated to the original training data. Our work constitutes a full development cycle that can efficiently and continuously improve the learnt IL policies in closed-loop. Finally, we show that our IL planner developed with less training resource still has superior performance compared to the previous state-of-the-art.
翻译:近年来,模仿学习(IL)作为自动驾驶车辆(AV)规划模块的核心技术已在工业界得到广泛应用。然而,现有IL研究在安全关键场景中表现出样本效率低下和泛化能力不足的问题,且此类场景鲜少被测试验证。为此,IL规划器可能陷入性能瓶颈,即增加更多训练数据不再能改善所学策略。本研究首先提出一种结合样条系数参数化与离线专家查询的IL模型,以提升安全性与训练效率。随后,我们通过优化驾驶员风险场(DRF)参数——一种基于Lyft预测数据集的多智能体交通模拟器中实现的可参数化人类驾驶行为模型——合成生成关键场景,从而揭示所学IL策略的局限性。为持续改进策略,我们利用增强数据对IL模型进行再训练。凭借DRF的表达能力与可解释性,可将期望驾驶行为编码并聚合至原始训练数据中。本研究构建了完整的开发闭环,能够高效且持续地改进所学IL策略。最后,实验表明,尽管我们的IL规划器使用更少的训练资源,其性能仍优于现有最优方法。