This study proposes a safe and sample-efficient reinforcement learning (RL) framework to address two major challenges in developing applicable RL algorithms: satisfying safety constraints and efficiently learning with limited samples. To guarantee safety in real-world complex environments, we use the safe set algorithm (SSA) to monitor and modify the nominal controls, and evaluate SSA+RL in a clustered dynamic environment which is challenging to be solved by existing RL algorithms. However, the SSA+RL framework is usually not sample-efficient especially in reward-sparse environments, which has not been addressed in previous safe RL works. To improve the learning efficiency, we propose three techniques: (1) avoiding behaving overly conservative by adapting the SSA; (2) encouraging safe exploration using random network distillation with safety constraints; (3) improving policy convergence by treating SSA as expert demonstrations and directly learn from that. The experimental results show that our framework can achieve better safety performance compare to other safe RL methods during training and solve the task with substantially fewer episodes. Project website: https://hychen-naza.github.io/projects/Safe_RL/.
翻译:本研究提出了一种安全且样本高效的强化学习框架,旨在解决开发实用强化学习算法中的两大挑战:满足安全约束和利用有限样本高效学习。为保障真实世界复杂环境中的安全性,我们采用安全集算法(SSA)监控并修正标称控制,并在现有强化学习算法难以解决的聚类动态环境中评估了SSA+RL框架。然而,SSA+RL框架通常样本效率较低,尤其在奖励稀疏环境中,这一问题在此前的安全强化学习工作中尚未被解决。为提升学习效率,我们提出了三种技术:(1)通过自适应SSA避免行为过于保守;(2)基于安全约束利用随机网络蒸馏鼓励安全探索;(3)将SSA视为专家示范并直接从中学习以改进策略收敛。实验结果表明,在训练过程中我们的框架相比其他安全强化学习方法能实现更优的安全性能,并以显著更少的回合数完成任务。项目网站:https://hychen-naza.github.io/projects/Safe_RL/。