Although RL is highly general and scalable, the difficulty of verifying policy behaviours poses challenges for safety-critical applications. To remedy this, we propose to apply verification methods used in control theory to learned value functions. By analyzing a simple task structure for safety preservation, we derive original theorems linking value functions to control barrier functions. Inspired by this, we propose novel metrics for verification of value functions in safe control tasks, and practical implementation details that improve learning. Besides proposing a novel method for certificate learning, our work unlocks a wealth of verification methods in control theory for RL policies, and represents a first step towards a framework for general, scalable, and verifiable design of control systems.
翻译:尽管强化学习具有高度的通用性和可扩展性,但策略行为难以验证的问题对安全关键应用构成了挑战。为解决这一问题,我们提议将控制理论中的验证方法应用于学习得到的价值函数。通过分析安全保持的简单任务结构,我们推导出将价值函数与控制屏障函数联系起来的原创性定理。受此启发,我们提出了在安全控制任务中验证价值函数的新颖度量,以及改善学习的实际实现细节。除了提出一种新的认证学习方法外,我们的工作还为强化学习策略解锁了丰富的控制理论验证方法,并标志着向通用、可扩展且可验证的控制系统设计框架迈出了第一步。