Any autonomous controller will be unsafe in some situations. The ability to quantitatively identify when these unsafe situations are about to occur is crucial for drawing timely human oversight in, e.g., freight transportation applications. In this work, we demonstrate that the true criticality of an agent's situation can be robustly defined as the mean reduction in reward given some number of random actions. Proxy criticality metrics that are computable in real-time (i.e., without actually simulating the effects of random actions) can be compared to the true criticality, and we show how to leverage these proxy metrics to generate safety margins, which directly tie the consequences of potentially incorrect actions to an anticipated loss in overall performance. We evaluate our approach on learned policies from APE-X and A3C within an Atari environment, and demonstrate how safety margins decrease as agents approach failure states. The integration of safety margins into programs for monitoring deployed agents allows for the real-time identification of potentially catastrophic situations.
翻译:任何自主控制器在某些情况下都会存在不安全因素。定量识别这些不安全情况即将发生的能力,对于在例如货运运输应用中及时引入人类监督至关重要。本研究证明,智能体所处情境的真实危急程度可稳健地定义为在给定随机动作次数下的平均奖励降低量。可实时计算(即无需实际仿真随机动作的影响)的代理危急度指标可与真实危急度进行对比,我们展示了如何利用这些代理指标生成安全裕度,该裕度直接将潜在错误动作的后果与总体性能的预期损失相关联。我们在Atari环境中基于APE-X和A3C算法对学习策略进行了评估,并验证了安全裕度如何随智能体接近失败状态而递减。将安全裕度集成到部署型智能体的监控程序中,可实现对潜在灾难性状况的实时识别。