Deep reinforcement learning algorithms (DRL) are increasingly being used in safety-critical systems. Ensuring the safety of DRL agents is a critical concern in such contexts. However, relying solely on testing is not sufficient to ensure safety as it does not offer guarantees. Building safety monitors is one solution to alleviate this challenge. This paper proposes SMARLA, a machine learning-based safety monitoring approach designed for DRL agents. For practical reasons, SMARLA is agnostic to the type of DRL agent's inputs. Further, it is designed to be black-box (as it does not require access to the internals or training data of the agent) by leveraging state abstraction to facilitate the learning of safety violation prediction models from the agent's states using a reduced state space. We quantitatively and qualitatively validated SMARLA on three well-known RL case studies. Empirical results reveal that SMARLA achieves accurate violation prediction with a low false positive rate and can predict safety violations at an early stage, approximately halfway through the execution of the agent, before violations occur.
翻译:深度强化学习算法(DRL)正日益应用于安全关键系统。在此类场景中,确保DRL智能体的安全性是一个至关重要的问题。然而,仅依赖测试不足以保障安全,因为测试无法提供确定性保证。构建安全监控器是缓解这一挑战的解决方案之一。本文提出SMARLA,一种基于机器学习、专为DRL智能体设计的安全监控方法。出于实用考虑,SMARLA对DRL智能体输入的类型保持不可知性。此外,该方法被设计为黑盒式(因其无需访问智能体内部结构或训练数据),通过利用状态抽象在约简的状态空间上从智能体状态中学习安全违规预测模型,从而促进模型构建。我们在三个经典的强化学习案例研究中对SMARLA进行了定量与定性验证。实验结果表明,SMARLA能以较低的误报率实现精准的违规预测,并能在违规发生前——约在智能体执行到中期阶段——提前预测安全违规。