The safety of autonomous vehicles (AV) has been a long-standing top concern, stemming from the absence of rare and safety-critical scenarios in the long-tail naturalistic driving distribution. To tackle this challenge, a surge of research in scenario-based autonomous driving has emerged, with a focus on generating high-risk driving scenarios and applying them to conduct safety-critical testing of AV models. However, limited work has been explored on the reuse of these extensive scenarios to iteratively improve AV models. Moreover, it remains intractable and challenging to filter through gigantic scenario libraries collected from other AV models with distinct behaviors, attempting to extract transferable information for current AV improvement. Therefore, we develop a continual driving policy optimization framework featuring Closed-Loop Individualized Curricula (CLIC), which we factorize into a set of standardized sub-modules for flexible implementation choices: AV Evaluation, Scenario Selection, and AV Training. CLIC frames AV Evaluation as a collision prediction task, where it estimates the chance of AV failures in these scenarios at each iteration. Subsequently, by re-sampling from historical scenarios based on these failure probabilities, CLIC tailors individualized curricula for downstream training, aligning them with the evaluated capability of AV. Accordingly, CLIC not only maximizes the utilization of the vast pre-collected scenario library for closed-loop driving policy optimization but also facilitates AV improvement by individualizing its training with more challenging cases out of those poorly organized scenarios. Experimental results clearly indicate that CLIC surpasses other curriculum-based training strategies, showing substantial improvement in managing risky scenarios, while still maintaining proficiency in handling simpler cases.
翻译:自动驾驶汽车的安全性一直是长期关注的焦点,其根本问题在于长尾自然驾驶分布中缺乏罕见且关键的风险场景。为应对这一挑战,基于场景的自动驾驶研究大量涌现,重点关注生成高风险驾驶场景并将其应用于自动驾驶模型的安全性关键测试。然而,现有研究在如何利用这些广泛场景迭代优化自动驾驶模型方面探索有限。此外,从其他行为模式不同的自动驾驶模型收集的海量场景库中筛选可迁移信息以改进当前模型,仍然具有难以处理且充满挑战的特性。为此,我们提出了一种基于闭环个性化课程(CLIC)的持续驾驶策略优化框架,并将其分解为一系列标准化的子模块以便灵活实现:自动驾驶评估、场景选择和自动驾驶训练。CLIC将自动驾驶评估定义为碰撞预测任务,即评估当前模型在各场景中的失效概率。随后,基于这些失效概率对历史场景进行重采样,CLIC为下游训练定制个性化课程,使其与当前自动驾驶模型的能力评估相匹配。因此,CLIC不仅最大化利用预先收集的庞大场景库进行闭环驾驶策略优化,还能通过从杂乱场景中选取更具挑战性的案例来个性化训练,从而促进自动驾驶模型性能提升。实验结果表明,CLIC显著优于其他基于课程的训练策略,在管理高风险场景方面表现出大幅提升,同时仍能保持对简单场景的处理能力。