SECRM-2D: RL-Based Efficient and Comfortable Route-Following Autonomous Driving with Analytic Safety Guarantees

Over the last decade, there has been increasing interest in autonomous driving systems. Reinforcement Learning (RL) shows great promise for training autonomous driving controllers, being able to directly optimize a combination of criteria such as efficiency comfort, and stability. However, RL- based controllers typically offer no safety guarantees, making their readiness for real deployment questionable. In this paper, we propose SECRM-2D (the Safe, Efficient and Comfortable RL- based driving Model with Lane-Changing), an RL autonomous driving controller (both longitudinal and lateral) that balances optimization of efficiency and comfort and follows a fixed route, while being subject to hard analytic safety constraints. The aforementioned safety constraints are derived from the criterion that the follower vehicle must have sufficient headway to be able to avoid a crash if the leader vehicle brakes suddenly. We evaluate SECRM-2D against several learning and non-learning baselines in simulated test scenarios, including freeway driving, exiting, merging, and emergency braking. Our results confirm that representative previously-published RL AV controllers may crash in both training and testing, even if they are optimizing a safety objective. By contrast, our controller SECRM-2D is successful in avoiding crashes during both training and testing, improves over the baselines in measures of efficiency and comfort, and is more faithful in following the prescribed route. In addition, we achieve a good theoretical understanding of the longitudinal steady-state of a collection of SECRM-2D vehicles.

翻译：过去十年间，自动驾驶系统日益受到关注。强化学习在训练自动驾驶控制器方面展现出巨大潜力，能够直接优化效率、舒适性和稳定性等多重指标。然而，基于强化学习的控制器通常无法提供安全保证，这使其实际部署的可靠性存疑。本文提出SECRM-2D（具备换道功能的、安全高效舒适的强化学习驾驶模型），这是一种基于强化学习的自动驾驶控制器（同时涵盖纵向与横向控制），在遵循硬性解析安全约束的前提下，能够平衡效率与舒适性的优化，并沿固定路线行驶。上述安全约束源于以下准则：当领航车辆突然制动时，跟随车辆必须保持足够的车头间距以避免碰撞。我们在模拟测试场景（包括高速公路驾驶、驶出匝道、汇入车流和紧急制动）中，将SECRM-2D与多种基于学习及非学习的基准方法进行比较评估。结果表明，即使以安全为目标进行优化，现有典型强化学习自动驾驶控制器在训练和测试中仍可能发生碰撞。相比之下，我们的控制器SECRM-2D在训练和测试中均能成功避免碰撞，在效率与舒适性指标上优于基准方法，且对预设路线的跟随精度更高。此外，我们还从理论上深入分析了多辆SECRM-2D车辆组成的纵向稳态特性。