Safety has been recognized as the central obstacle to preventing the use of reinforcement learning (RL) for real-world applications. Different methods have been developed to deal with safety concerns in RL. However, learning reliable RL-based solutions usually require a large number of interactions with the environment. Likewise, how to improve the learning efficiency, specifically, how to utilize transfer learning for safe reinforcement learning, has not been well studied. In this work, we propose an adaptive aggregation framework for safety-critical control. Our method comprises two key techniques: 1) we learn to transfer the safety knowledge by aggregating the multiple source tasks and a target task through the attention network; 2) we separate the goal of improving task performance and reducing constraint violations by utilizing a safeguard. Experiment results demonstrate that our algorithm can achieve fewer safety violations while showing better data efficiency compared with several baselines.
翻译:安全已被视为阻止强化学习(RL)在实际应用中落地的核心障碍。已有多种方法被提出以应对RL中的安全问题,然而,学习可靠的基于RL的解决方案通常需要与环境进行大量交互。同样,如何提升学习效率,尤其是如何利用迁移学习促进安全强化学习,仍未得到充分研究。在本工作中,我们提出了一种面向安全关键控制的自适应聚合框架。该方法包含两项关键技术:1)通过注意力网络聚合多个源任务与目标任务,从而学习迁移安全知识;2)利用安全防护机制分离提升任务表现与减少约束违反两个目标。实验结果表明,与多个基线方法相比,我们的算法在实现更少安全违规的同时,展现出更高的数据效率。