In this work, we propose a self-improving artificial intelligence system to enhance the safety performance of reinforcement learning (RL)-based autonomous driving (AD) agents using black-box verification methods. RL algorithms have become popular in AD applications in recent years. However, the performance of existing RL algorithms heavily depends on the diversity of training scenarios. A lack of safety-critical scenarios during the training phase could result in poor generalization performance in real-world driving applications. We propose a novel framework in which the weaknesses of the training set are explored through black-box verification methods. After discovering AD failure scenarios, the RL agent's training is re-initiated via transfer learning to improve the performance of previously unsafe scenarios. Simulation results demonstrate that our approach efficiently discovers safety failures of action decisions in RL-based adaptive cruise control (ACC) applications and significantly reduces the number of vehicle collisions through iterative applications of our method. The source code is publicly available at https://github.com/data-and-decision-lab/self-improving-RL.
翻译:本文提出了一种自改进人工智能系统,利用黑盒验证方法提升基于强化学习(RL)的自动驾驶(AD)智能体的安全性能。近年来,强化学习算法在自动驾驶领域得到广泛应用。然而,现有强化学习算法的性能高度依赖于训练场景的多样性。训练阶段缺乏安全关键场景可能导致在真实驾驶应用中泛化性能不佳。我们提出了一种新颖框架,通过黑盒验证方法探索训练集的薄弱环节。在发现自动驾驶失效场景后,通过迁移学习重新启动强化学习智能体的训练,以改进先前不安全场景的性能。仿真结果表明,我们的方法能有效发现基于强化学习的自适应巡航控制(ACC)应用中动作决策的安全失效,并通过方法迭代应用显著减少车辆碰撞次数。源代码已公开于https://github.com/data-and-decision-lab/self-improving-RL。