This work provides a state-of-the-art survey of continual safe online reinforcement learning (COSRL) methods. We discuss theoretical aspects, challenges, and open questions in building continual online safe reinforcement learning algorithms. We provide the taxonomy and the details of continual online safe reinforcement learning methods based on the type of safe learning mechanism that takes adaptation to nonstationarity into account. We categorize safety constraints formulation for online reinforcement learning algorithms, and finally, we discuss prospects for creating reliable, safe online learning algorithms. Keywords: safe RL in nonstationary environments, safe continual reinforcement learning under nonstationarity, HM-MDP, NSMDP, POMDP, safe POMDP, constraints for continual learning, safe continual reinforcement learning review, safe continual reinforcement learning survey, safe continual reinforcement learning, safe online learning under distribution shift, safe continual online adaptation, safe reinforcement learning, safe exploration, safe adaptation, constrained Markov decision processes, safe reinforcement learning, partially observable Markov decision process, safe reinforcement learning and hidden Markov decision processes, Safe Online Reinforcement Learning, safe online reinforcement learning, safe online reinforcement learning, safe meta-learning, safe meta-reinforcement learning, safe context-based reinforcement learning, formulating safety constraints for continual learning
翻译:本文对持续安全在线强化学习(COSRL)方法进行了前沿综述。我们讨论了构建持续在线安全强化学习算法的理论层面、挑战与开放性问题。基于考虑非平稳性适应的安全学习机制类型,我们提供了持续在线安全强化学习方法的分类体系与详细阐述。我们系统归类了在线强化学习算法的安全约束形式化方法,最后探讨了构建可靠、安全的在线学习算法的未来前景。关键词:非平稳环境下的安全强化学习,非平稳条件下的安全持续强化学习,HM-MDP,NSMDP,POMDP,安全POMDP,持续学习约束,安全持续强化学习综述,安全持续强化学习调研,安全持续强化学习,分布漂移下的安全在线学习,安全持续在线适应,安全强化学习,安全探索,安全适应,约束马尔可夫决策过程,安全强化学习,部分可观测马尔可夫决策过程,安全强化学习及隐马尔可夫决策过程,安全在线强化学习,安全在线强化学习,安全在线强化学习,安全元学习,安全元强化学习,安全基于上下文的强化学习,持续学习的安全约束形式化