While AI algorithms have shown remarkable success in various fields, their lack of transparency hinders their application to real-life tasks. Although explanations targeted at non-experts are necessary for user trust and human-AI collaboration, the majority of explanation methods for AI are focused on developers and expert users. Counterfactual explanations are local explanations that offer users advice on what can be changed in the input for the output of the black-box model to change. Counterfactuals are user-friendly and provide actionable advice for achieving the desired output from the AI system. While extensively researched in supervised learning, there are few methods applying them to reinforcement learning (RL). In this work, we explore the reasons for the underrepresentation of a powerful explanation method in RL. We start by reviewing the current work in counterfactual explanations in supervised learning. Additionally, we explore the differences between counterfactual explanations in supervised learning and RL and identify the main challenges that prevent the adoption of methods from supervised in reinforcement learning. Finally, we redefine counterfactuals for RL and propose research directions for implementing counterfactuals in RL.
翻译:尽管AI算法在各个领域取得了显著成功,但其缺乏透明性阻碍了其在现实任务中的应用。尽管针对非专家的解释对于用户信任和人机协作至关重要,但目前大多数AI解释方法主要面向开发者和专家用户。反事实解释是一种局部解释方法,它为用户提供关于如何修改输入以改变黑盒模型输出的建议。反事实解释具有用户友好性,并能提供可操作的指导以实现AI系统的期望输出。尽管反事实解释在监督学习中已被广泛研究,但将其应用于强化学习(RL)的方法仍然很少。本文探讨了这种强解释方法在强化学习中应用不足的原因。我们首先回顾了监督学习中反事实解释的现有工作,进一步分析了监督学习与强化学习中反事实解释的差异,并识别了阻碍监督学习方法迁移到强化学习的主要挑战。最后,我们重新定义了强化学习的反事实解释,并提出了在强化学习中实现反事实解释的研究方向。