As more non-AI experts use complex AI systems for daily tasks, there has been an increasing effort to develop methods that produce explanations of AI decision making that are understandable by non-AI experts. Towards this effort, leveraging higher-level concepts and producing concept-based explanations have become a popular method. Most concept-based explanations have been developed for classification techniques, and we posit that the few existing methods for sequential decision making are limited in scope. In this work, we first contribute a desiderata for defining concepts in sequential decision making settings. Additionally, inspired by the Protege Effect which states explaining knowledge often reinforces one's self-learning, we explore how concept-based explanations of an RL agent's decision making can in turn improve the agent's learning rate, as well as improve end-user understanding of the agent's decision making. To this end, we contribute a unified framework, State2Explanation (S2E), that involves learning a joint embedding model between state-action pairs and concept-based explanations, and leveraging such learned model to both (1) inform reward shaping during an agent's training, and (2) provide explanations to end-users at deployment for improved task performance. Our experimental validations, in Connect 4 and Lunar Lander, demonstrate the success of S2E in providing a dual-benefit, successfully informing reward shaping and improving agent learning rate, as well as significantly improving end user task performance at deployment time.
翻译:随着非人工智能专家越来越多地使用复杂的人工智能系统完成日常任务,开发能够被非专业人士理解的人工智能决策解释方法的需求日益增长。为此,利用高层概念并生成基于概念的解释已成为一种流行方法。当前,大多数基于概念的解释主要针对分类技术开发,而我们认为少数现存的顺序决策方法在应用范围上存在局限。本文首先提出顺序决策场景中定义概念的准则。此外,受"门徒效应"(即解释知识往往能强化自我学习)启发,我们探索了强化学习智能体决策的基于概念解释如何能够反向提升智能体的学习速率,并增进最终用户对智能体决策过程的理解。为此,我们提出了统一框架State2Explanation(S2E),该框架包含状态-动作对与基于概念解释的联合嵌入模型学习,并利用该学习模型实现双重目标:(1)在智能体训练过程中指导奖励塑形;(2)在部署阶段为最终用户提供解释以提升任务表现。我们在Connect 4和Lunar Lander环境中的实验验证表明,S2E成功实现了双重收益——既能有效指导奖励塑形并提升智能体学习速率,又能显著提升部署时最终用户的任务完成表现。