Reinforcement learning (RL) has achieved tremendous success in many complex decision making tasks. When it comes to deploying RL in the real world, safety concerns are usually raised, leading to a growing demand for safe RL algorithms, such as in autonomous driving and robotics scenarios. While safety control has a long history, the study of safe RL algorithms is still in the early stages. To establish a good foundation for future research in this thread, in this paper, we provide a review for safe RL from the perspectives of methods, theory and applications. Firstly, we review the progress of safe RL from five dimensions and come up with five problems that are crucial for safe RL being deployed in real-world applications, coined as "2H3W". Secondly, we analyze the theory and algorithm progress from the perspectives of answering the "2H3W" problems. Then, the sample complexity of safe RL methods is reviewed and discussed, followed by an introduction of the applications and benchmarks of safe RL algorithms. Finally, we open the discussion of the challenging problems in safe RL, hoping to inspire more future research on this thread. To advance the study of safe RL algorithms, we release a benchmark suite, an open-sourced repository containing the implementations of major safe RL algorithms, along with tutorials at the link: https://github.com/chauncygu/Safe-Reinforcement-Learning-Baselines.git.
翻译:强化学习(RL)在众多复杂决策任务中取得了巨大成功。然而,将RL部署到现实世界时,安全性问题日益凸显,尤其在自动驾驶和机器人等场景中,对安全RL算法的需求与日俱增。尽管安全控制领域历史悠久,但安全RL算法的研究仍处于早期阶段。为奠定该研究方向未来发展的坚实基础,本文从方法、理论与应用三个维度对安全RL进行综述。首先,我们从五个维度梳理安全RL的研究进展,提出安全RL在实际应用中亟待解决的五个关键问题,即"2H3W"框架。其次,我们从回答"2H3W"问题的视角出发,分析相关理论与算法进展。随后,综述并讨论安全RL方法的样本复杂度问题,并介绍安全RL算法的应用场景与基准测试。最后,我们探讨安全RL领域面临的挑战性问题,以期激发该方向更多未来研究。为推进安全RL算法研究,我们开源了包含主流安全RL算法实现及其教程的基准测试套件,代码仓库地址:https://github.com/chauncygu/Safe-Reinforcement-Learning-Baselines.git。