Deep reinforcement learning algorithms (DRL) are increasingly being used in safety-critical systems. Ensuring the safety of DRL agents is a critical concern in such contexts. However, relying solely on testing is not sufficient to ensure safety as it does not offer guarantees. Building safety monitors is one solution to alleviate this challenge. This paper proposes SMARLA, a machine learning-based safety monitoring approach designed for DRL agents. For practical reasons, SMARLA is designed to be black-box (as it does not require access to the internals of the agent) and leverages state abstraction to reduce the state space and thus facilitate the learning of safety violation prediction models from agent's states. We validated SMARLA on two well-known RL case studies. Empirical analysis reveals that SMARLA achieves accurate violation prediction with a low false positive rate, and can predict safety violations at an early stage, approximately halfway through the agent's execution before violations occur.
翻译:深度强化学习算法(DRL)正越来越多地被应用于安全关键系统。在此类场景中,确保DRL智能体的安全性是一个关键问题。然而,仅依赖测试不足以保障安全性,因为测试无法提供保证。构建安全监控器是缓解这一挑战的解决方案之一。本文提出了SMARLA,一种基于机器学习的安全监控方法,专为DRL智能体设计。出于实际考虑,SMARLA采用黑盒设计(无需访问智能体内部结构),并利用状态抽象技术来缩减状态空间,从而促进基于智能体状态的安全违规预测模型的学习。我们在两个经典的强化学习案例研究中对SMARLA进行了验证。实证分析表明,SMARLA能够以较低的假阳性率实现准确的安全违规预测,且可在智能体执行过程中提前预警——大约在违规发生前一半的执行阶段即可预测到安全违规。