Despite the tremendous success of Reinforcement Learning (RL) algorithms in simulation environments, applying RL to real-world applications still faces many challenges. A major concern is safety, in another word, constraint satisfaction. State-wise constraints are one of the most common constraints in real-world applications and one of the most challenging constraints in Safe RL. Enforcing state-wise constraints is necessary and essential to many challenging tasks such as autonomous driving, robot manipulation. This paper provides a comprehensive review of existing approaches that address state-wise constraints in RL. Under the framework of State-wise Constrained Markov Decision Process (SCMDP), we will discuss the connections, differences, and trade-offs of existing approaches in terms of (i) safety guarantee and scalability, (ii) safety and reward performance, and (iii) safety after convergence and during training. We also summarize limitations of current methods and discuss potential future directions.
翻译:尽管强化学习算法在模拟环境中取得了巨大成功,但将其应用于现实场景仍面临诸多挑战。其中一个核心问题是安全性,即约束满足。状态级约束是现实应用中最常见的约束类型之一,也是安全强化学习中最具挑战性的约束之一。实施状态级约束对于自动驾驶、机器人操作等复杂任务而言至关重要。本文对现有解决状态级约束强化学习问题的方法进行了全面综述。在状态级约束马尔可夫决策过程框架下,我们将从以下维度探讨现有方法的关联性、差异性与权衡性:(i)安全性保证与可扩展性,(ii)安全性与奖励性能,(iii)收敛后安全性与训练过程中安全性。同时,我们总结了当前方法的局限性并探讨了潜在未来研究方向。