Autonomous vehicles have to obey traffic rules. These rules are often formalized using temporal logic, resulting in constraints that are hard to solve using optimization-based motion planners. Reinforcement Learning (RL) is a promising method to find motion plans adhering to temporal logic specifications. However, vanilla RL algorithms are based on random exploration, which is inherently unsafe. To address this issue, we propose a provably safe RL approach that always complies with traffic rules. As a specific application area, we consider vessels on the open sea, which must adhere to the Convention on the International Regulations for Preventing Collisions at Sea (COLREGS). We introduce an efficient verification approach that determines the compliance of actions with respect to the COLREGS formalized using temporal logic. Our action verification is integrated into the RL process so that the agent only selects verified actions. In contrast to agents that only integrate the traffic rule information in the reward function, our provably safe agent always complies with the formalized rules in critical maritime traffic situations and, thus, never causes a collision.
翻译:自动驾驶车辆必须遵守交通规则。这些规则通常使用时态逻辑进行形式化,导致基于优化的运动规划器难以求解相应约束。强化学习是一种有前景的能够满足时态逻辑规范的运动规划方法。然而,基础强化学习算法基于随机探索,这本质上是不可靠的。为解决这一问题,我们提出了一种始终遵守交通规则的可证明安全强化学习方案。作为具体应用场景,我们考虑必须遵守《国际海上避碰规则公约》的公海船舶。我们引入了一种高效的验证方法,该方法能够确定动作相对于使用时态逻辑形式化的避碰规则的合规性。该动作验证机制被集成到强化学习过程中,使得智能体仅选择经过验证的动作。与仅将交通规则信息融入奖励函数的智能体相比,我们的可证明安全智能体在关键海上交通场景中始终遵守形式化规则,从而避免碰撞。