In recent years, deep reinforcement learning (DRL) approaches have generated highly successful controllers for a myriad of complex domains. However, the opaque nature of these models limits their applicability in aerospace systems and safety-critical domains, in which a single mistake can have dire consequences. In this paper, we present novel advancements in both the training and verification of DRL controllers, which can help ensure their safe behavior. We showcase a design-for-verification approach utilizing k-induction and demonstrate its use in verifying liveness properties. In addition, we also give a brief overview of neural Lyapunov Barrier certificates and summarize their capabilities on a case study. Finally, we describe several other novel reachability-based approaches which, despite failing to provide guarantees of interest, could be effective for verification of other DRL systems, and could be of further interest to the community.
翻译:近年来,深度强化学习方法已在众多复杂领域中成功生成高性能控制器。然而,这些模型的不透明性限制了其在航天系统和安全关键领域的应用,因为在这些领域中,单个错误可能导致严重后果。本文提出了深度强化学习控制器训练与验证方面的新进展,这些方法有助于确保其行为安全性。我们展示了一种利用k归纳法的可验证性设计方法,并演示了其在验证活性属性中的应用。此外,我们还简要概述了神经李雅普诺夫屏障证书,并通过案例研究总结了其验证能力。最后,我们描述了其他几种基于可达性的新方法,这些方法虽未能提供所需的理论保证,但对其他深度强化学习系统的验证可能具有实用价值,值得学界进一步关注。