With the increasing use of neural policies in control systems, ensuring their safety and reliability has become a critical software engineering task. One prevalent approach to ensuring the safety of neural policies is to deploy programmatic runtime shields alongside them to correct their unsafe commands. However, the programmatic runtime shields synthesized by existing methods are either computationally expensive or insufficiently permissive, resulting in high overhead and unnecessary interventions on the system. To address these challenges, we propose Aegis, a novel framework that synthesizes lightweight and permissive programmatic runtime shields for neural policies. Aegis achieves this by formulating the seeking of a runtime shield as a sketch-based program synthesis problem and proposing a novel method that leverages counterexample-guided inductive synthesis and Bayesian optimization to solve it. To evaluate Aegis and its synthesized shields, we use eight representative control systems and compare Aegis with the current state-of-the-art. Our results show that the programmatic runtime shields synthesized by Aegis can correct all unsafe commands from neural policies, ensuring that the systems do not violate any desired safety properties at all times. Compared to the current state-of-the-art, Aegis's shields exhibit a 2.2$\times$ reduction in time overhead and a 3.9$\times$ reduction in memory usage, suggesting that they are much more lightweight. Moreover, Aegis's shields incur an average of 1.5$\times$ fewer interventions than other shields, showing better permissiveness.
翻译:随着神经网络策略在控制系统中的应用日益广泛,确保其安全性和可靠性已成为一项关键的软件工程任务。保障神经策略安全性的一种主流方法是部署程序化运行时防护罩以修正其不安全指令。然而,现有方法合成的程序化运行时防护罩要么计算开销高昂,要么限制过于严格,导致系统产生高额开销和不必要的干预。为应对这些挑战,我们提出了Aegis——一个为神经策略合成轻量级且宽松的程序化运行时防护罩的新型框架。Aegis通过将运行时防护罩的构建问题形式化为基于草图的程序合成问题,并提出一种结合反例引导归纳合成与贝叶斯优化的新方法来解决该问题。为评估Aegis及其合成防护罩的性能,我们在八个代表性控制系统上进行实验,并将Aegis与当前最先进方法进行对比。实验结果表明,Aegis合成的程序化运行时防护罩能够修正神经策略产生的所有不安全指令,确保系统始终不违反任何预期的安全属性。与当前最优方法相比,Aegis防护罩的时间开销降低了2.2倍,内存使用量减少了3.9倍,表明其具有更高的轻量化程度。此外,Aegis防护罩的平均干预次数比其他防护罩减少1.5倍,展现出更优的宽松性。