Reinforcement Learning (RL) systems can be complex and non-interpretable, making it challenging for non-AI experts to understand or intervene in their decisions. This is due in part to the sequential nature of RL in which actions are chosen because of future rewards. However, RL agents discard the qualitative features of their training, making it difficult to recover user-understandable information for "why" an action is chosen. We propose a technique, Experiential Explanations, to generate counterfactual explanations by training influence predictors along with the RL policy. Influence predictors are models that learn how sources of reward affect the agent in different states, thus restoring information about how the policy reflects the environment. A human evaluation study revealed that participants presented with experiential explanations were better able to correctly guess what an agent would do than those presented with other standard types of explanation. Participants also found that experiential explanations are more understandable, satisfying, complete, useful, and accurate. The qualitative analysis provides insights into the factors of experiential explanations that are most useful.
翻译:强化学习系统可能具有复杂性和不可解释性,使得非人工智能专家难以理解或干预其决策。这在一定程度上源于强化学习的序列特性——动作的选择依赖于未来奖励。然而,强化学习智能体丢弃了训练过程中的定性特征,导致难以恢复可供用户理解的"为何选择该动作"的信息。我们提出一种名为"情境化解释"的技术,通过将影响预测器与强化学习策略共同训练来生成反事实解释。影响预测器是能够学习奖励源在不同状态下如何影响智能体的模型,从而恢复策略反映环境特性的信息。一项人类评估研究表明,与其他标准解释类型相比,接触过情境化解释的参与者能更准确推测智能体的行为。参与者还认为情境化解释更易理解、更令人满意、更完整、更实用且更准确。定性分析揭示了情境化解释中最具效用的关键要素。