Ensuring safety of reinforcement learning (RL) algorithms is crucial to unlock their potential for many real-world tasks. However, vanilla RL does not guarantee safety. In recent years, several methods have been proposed to provide safety guarantees for RL by design. Yet, there is no comprehensive comparison of these provably safe RL methods. We therefore introduce a categorization of existing provably safe RL methods, present the theoretical foundations for both continuous and discrete action spaces, and benchmark the methods' performance empirically. The methods are categorized based on how the action is adapted by the safety method: action replacement, action projection, and action masking. Our experiments on an inverted pendulum and quadrotor stabilization task show that all provably safe methods are indeed always safe. Furthermore, their trained performance is comparable to unsafe baselines. The benchmarking suggests that different provably safe RL approaches should be selected depending on safety specifications, RL algorithms, and type of action space.
翻译:确保强化学习算法的安全性是解锁其应用于众多现实任务潜力的关键。然而,基础强化学习无法保证安全性。近年来,研究者提出了多种方法,旨在从设计上为强化学习提供安全保障。但目前缺乏对这些可证明安全的强化学习方法的系统性比较。为此,我们引入对现有可证明安全的强化学习方法的分类体系,阐述连续动作空间与离散动作空间的理论基础,并通过实验评估各方法的性能。这些方法根据安全性方法对动作的调整方式进行分类:动作替换、动作投影和动作掩码。在倒立摆与四旋翼飞行器稳定控制任务上的实验表明,所有可证明安全的方法确实始终维持安全性能,且其训练后的表现与无安全约束的基线方法相当。基准测试结果显示:应根据安全规范、强化学习算法及动作空间类型选择不同的可证明安全的强化学习方法。