Deep reinforcement learning algorithms (DRL) are increasingly being used in safety-critical systems. Ensuring the safety of DRL agents is a critical concern in such contexts. However, relying solely on testing is not sufficient to ensure safety as it does not offer guarantees. Building safety monitors is one solution to alleviate this challenge. This paper proposes SMARLA, a machine learning-based safety monitoring approach designed for DRL agents. For practical reasons, SMARLA is designed to be black-box (as it does not require access to the internals or training data of the agent) and leverages state abstraction to reduce the state space and thus facilitate the learning of safety violation prediction models from agent's states. We validated SMARLA on two well-known RL case studies. Empirical analysis reveals that SMARLA achieves accurate violation prediction with a low false positive rate, and can predict safety violations at an early stage, approximately halfway through the agent's execution before violations occur.
翻译:深度强化学习算法(DRL)正越来越多地被应用于安全关键系统中。在此背景下,确保DRL智能体的安全性成为关键问题。然而,仅依赖测试不足以保障安全,因其无法提供保证。构建安全监控器是缓解该挑战的一种解决方案。本文提出SMARLA,一种基于机器学习的安全监控方法,专为DRL智能体设计。出于实际考量,SMARLA被设计为黑盒模型(无需访问智能体内部结构或训练数据),并利用状态抽象缩减状态空间,从而便于从智能体状态中学习安全违规预测模型。我们在两个著名的强化学习案例研究中验证了SMARLA。实证分析表明,SMARLA能够以低误报率实现准确的违规预测,并可在智能体执行过程中(约在违规发生前一半进程时)提前预警安全违规。