Guaranteeing safe behaviour of reinforcement learning (RL) policies poses significant challenges for safety-critical applications, despite RL's generality and scalability. To address this, we propose a new approach to apply verification methods from control theory to learned value functions. By analyzing task structures for safety preservation, we formalize original theorems that establish links between value functions and control barrier functions. Further, we propose novel metrics for verifying value functions in safe control tasks and practical implementation details to improve learning. Our work presents a novel method for certificate learning, which unlocks a diversity of verification techniques from control theory for RL policies, and marks a significant step towards a formal framework for the general, scalable, and verifiable design of RL-based control systems. Code and videos are available at this https url: https://rl-cbf.github.io/
翻译:强化学习策略在保证安全行为方面面临重大挑战,尤其对于安全关键型应用,尽管强化学习具有通用性和可扩展性。为此,我们提出一种新方法,将控制理论中的验证方法应用于学习到的价值函数。通过分析任务结构以保持安全性,我们形式化了原始定理,建立了价值函数与控制屏障函数之间的联系。进一步,我们提出了验证安全控制任务中价值函数的新指标,以及改进学习的实用实现细节。我们的工作提出了一种新颖的证书学习方法,解锁了控制理论中多种验证技术用于强化学习策略,标志着向基于强化学习的控制系统通用、可扩展且可验证设计的正式框架迈出了重要一步。代码和视频可访问此https链接:https://rl-cbf.github.io/