A runtime assurance system (RTA) for a given plant enables the exercise of an untrusted or experimental controller while assuring safety with a backup (or safety) controller. The relevant computational design problem is to create a logic that assures safety by switching to the safety controller as needed, while maximizing some performance criteria, such as the utilization of the untrusted controller. Existing RTA design strategies are well-known to be overly conservative and, in principle, can lead to safety violations. In this paper, we formulate the optimal RTA design problem and present a new approach for solving it. Our approach relies on reward shaping and reinforcement learning. It can guarantee safety and leverage machine learning technologies for scalability. We have implemented this algorithm and present experimental results comparing our approach with state-of-the-art reachability and simulation-based RTA approaches in a number of scenarios using aircraft models in 3D space with complex safety requirements. Our approach can guarantee safety while increasing utilization of the experimental controller over existing approaches.
翻译:运行时保障系统(RTA)针对给定被控对象,允许在确保安全的前提下运行不可信或实验性控制器,并借助备份(或安全)控制器保障系统安全。相关计算设计问题在于创建一种逻辑,通过按需切换至安全控制器来保证安全,同时最大化某些性能指标(例如不可信控制器的利用率)。现有RTA设计策略被公认为过于保守,原则上可能导致安全违规。本文形式化定义了最优RTA设计问题,并提出了一种新型求解方法。该方法基于奖励塑形和强化学习,既能保证安全性,又可利用机器学习技术实现可扩展性。我们实现了该算法,并在三维空间中具有复杂安全要求的飞机模型场景下,将所提方法与最先进的可达性及仿真RTA方法进行了实验对比。结果表明,与现有方法相比,我们的方法在保证安全的同时,能够提高实验控制器的利用率。