In recent years, advances in deep learning have resulted in a plethora of successes in the use of reinforcement learning (RL) to solve complex sequential decision tasks with high-dimensional inputs. However, existing systems lack the necessary mechanisms to provide humans with a holistic view of their competence, presenting an impediment to their adoption, particularly in critical applications where the decisions an agent makes can have significant consequences. Yet, existing RL-based systems are essentially competency-unaware in that they lack the necessary interpretation mechanisms to allow human operators to have an insightful, holistic view of their competency. Towards more explainable Deep RL (xDRL), we propose a new framework based on analyses of interestingness. Our tool provides various measures of RL agent competence stemming from interestingness analysis and is applicable to a wide range of RL algorithms, natively supporting the popular RLLib toolkit. We showcase the use of our framework by applying the proposed pipeline in a set of scenarios of varying complexity. We empirically assess the capability of the approach in identifying agent behavior patterns and competency-controlling conditions, and the task elements mostly responsible for an agent's competence, based on global and local analyses of interestingness. Overall, we show that our framework can provide agent designers with insights about RL agent competence, both their capabilities and limitations, enabling more informed decisions about interventions, additional training, and other interactions in collaborative human-machine settings.
翻译:近年来,深度学习的发展使得强化学习在解决高维输入的复杂序列决策任务中取得了一系列成功。然而,现有系统缺乏为人类提供其能力整体视图的必要机制,这阻碍了它们的应用,尤其是在代理决策可能产生重大后果的关键场景中。实际上,基于强化学习的现有系统本质上缺乏能力感知能力,因为它们缺少必要的解释机制,使人类操作员能够对其能力形成深刻且全面的认识。为实现更具可解释性的深度强化学习(xDRL),我们提出了一种基于趣味性分析的新框架。我们的工具提供了源于趣味性分析的强化学习代理能力多种度量指标,适用于广泛的强化学习算法,并原生支持流行的RLLib工具包。我们通过将所提出的流水线应用于一系列不同复杂度的场景来展示该框架的使用。基于全局与局部的趣味性分析,我们实证评估了该方法在识别代理行为模式、能力控制条件以及主要影响代理能力的任务要素方面的能力。总体而言,我们表明该框架能为代理设计者提供关于强化学习代理能力(包括其优势与局限)的洞察,从而在人机协作环境中支持更明智的干预决策、额外训练及其他交互操作。