The safety of autonomous vehicles (AV) has been a long-standing top concern, stemming from the absence of rare and safety-critical scenarios in the long-tail naturalistic driving distribution. To tackle this challenge, a surge of research in scenario-based autonomous driving has emerged, with a focus on generating high-risk driving scenarios and applying them to conduct safety-critical testing of AV models. However, limited work has been explored on the reuse of these extensive scenarios to iteratively improve AV models. Moreover, it remains intractable and challenging to filter through gigantic scenario libraries collected from other AV models with distinct behaviors, attempting to extract transferable information for current AV improvement. Therefore, we develop a continual driving policy optimization framework featuring Closed-Loop Individualized Curricula (CLIC), which we factorize into a set of standardized sub-modules for flexible implementation choices: AV Evaluation, Scenario Selection, and AV Training. CLIC frames AV Evaluation as a collision prediction task, where it estimates the chance of AV failures in these scenarios at each iteration. Subsequently, by re-sampling from historical scenarios based on these failure probabilities, CLIC tailors individualized curricula for downstream training, aligning them with the evaluated capability of AV. Accordingly, CLIC not only maximizes the utilization of the vast pre-collected scenario library for closed-loop driving policy optimization but also facilitates AV improvement by individualizing its training with more challenging cases out of those poorly organized scenarios. Experimental results clearly indicate that CLIC surpasses other curriculum-based training strategies, showing substantial improvement in managing risky scenarios, while still maintaining proficiency in handling simpler cases.
翻译:自动驾驶车辆(AV)的安全性一直是长期关注的首要问题,其根源在于长尾自然驾驶分布中罕见且涉及安全关键场景的缺失。为应对这一挑战,基于场景的自动驾驶研究蓬勃兴起,重点聚焦于生成高风险驾驶场景并应用于AV模型的安全关键测试。然而,现有研究鲜少探索如何复用这些海量场景以迭代改进AV模型。此外,从其他具有不同行为特征的AV模型所收集的庞大数据集中筛选可迁移信息以改进当前模型,仍是一项棘手难题。为此,我们提出一种基于闭环个性化课程规划(CLIC)的持续驾驶策略优化框架,将其分解为一组标准化子模块以支持灵活实现方案:AV评估、场景选择与AV训练。CLIC将AV评估视为碰撞预测任务,在每次迭代中估计场景中AV的故障概率。进而,基于这些故障概率对历史场景进行重采样,CLIC为下游训练定制个性化课程,使其与当前AV的评估能力对齐。由此,CLIC不仅最大化利用预收集的庞大规模场景库进行闭环驾驶策略优化,还能通过从无序场景中筛选更具挑战性的案例实现个性化训练,有效促进AV性能提升。实验结果表明,CLIC显著优于其他基于课程的训练策略,在管理高风险场景方面表现卓越,同时保持对简单场景的熟练处理能力。