The safety of autonomous vehicles (AV) has been a long-standing top concern, stemming from the absence of rare and safety-critical scenarios in the long-tail naturalistic driving distribution. To tackle this challenge, a surge of research in scenario-based autonomous driving has emerged, with a focus on generating high-risk driving scenarios and applying them to conduct safety-critical testing of AV models. However, limited work has been explored on the reuse of these extensive scenarios to iteratively improve AV models. Moreover, it remains intractable and challenging to filter through gigantic scenario libraries collected from other AV models with distinct behaviors, attempting to extract transferable information for current AV improvement. Therefore, we develop a continual driving policy optimization framework featuring Closed-Loop Individualized Curricula (CLIC), which we factorize into a set of standardized sub-modules for flexible implementation choices: AV Evaluation, Scenario Selection, and AV Training. CLIC frames AV Evaluation as a collision prediction task, where it estimates the chance of AV failures in these scenarios at each iteration. Subsequently, by re-sampling from historical scenarios based on these failure probabilities, CLIC tailors individualized curricula for downstream training, aligning them with the evaluated capability of AV. Accordingly, CLIC not only maximizes the utilization of the vast pre-collected scenario library for closed-loop driving policy optimization but also facilitates AV improvement by individualizing its training with more challenging cases out of those poorly organized scenarios. Experimental results clearly indicate that CLIC surpasses other curriculum-based training strategies, showing substantial improvement in managing risky scenarios, while still maintaining proficiency in handling simpler cases.
翻译:自动驾驶汽车的安全问题一直是首要关注的焦点,这源于长尾自然驾驶分布中罕见且关键的场景缺失。为应对这一挑战,基于场景的自动驾驶研究日益兴起,重点在于生成高风险驾驶场景并将其应用于自动驾驶模型的安全性关键测试。然而,关于如何复用这些大规模场景以迭代改进自动驾驶模型的研究仍十分有限。此外,从其他行为迥异的自动驾驶模型收集的庞大场景库中筛选可迁移信息以改进当前模型,仍然是一个棘手且具有挑战性的问题。为此,我们提出了一种持续驾驶策略优化框架,其中包含闭环个性化课程(CLIC),并将其分解为一组标准化的子模块,以便灵活选择实现方式:自动驾驶评估、场景选择和自动驾驶训练。CLIC将自动驾驶评估视为碰撞预测任务,在每个迭代中估计模型在这些场景中的故障概率。随后,基于这些故障概率从历史场景中重新采样,CLIC为下游训练定制个性化课程,使其与评估的自动驾驶能力相匹配。因此,CLIC不仅最大化了大规模预收集场景库在闭环驾驶策略优化中的利用率,还通过从杂乱无章的场景中挑选更具挑战性的案例进行个性化训练,促进了自动驾驶模型的改进。实验结果表明,CLIC优于其他基于课程的训练策略,在管理高风险场景方面表现出显著提升,同时仍能保持处理简单情况的能力。