Explainable Artificial Intelligence (XAI) has received widespread interest in recent years, and two of the most popular types of explanations are feature attributions, and counterfactual explanations. These classes of approaches have been largely studied independently and the few attempts at reconciling them have been primarily empirical. This work establishes a clear theoretical connection between game-theoretic feature attributions, focusing on but not limited to SHAP, and counterfactuals explanations. After motivating operative changes to Shapley values based feature attributions and counterfactual explanations, we prove that, under conditions, they are in fact equivalent. We then extend the equivalency result to game-theoretic solution concepts beyond Shapley values. Moreover, through the analysis of the conditions of such equivalence, we shed light on the limitations of naively using counterfactual explanations to provide feature importances. Experiments on three datasets quantitatively show the difference in explanations at every stage of the connection between the two approaches and corroborate the theoretical findings.
翻译:可解释人工智能(XAI)近年来受到广泛关注,其中最流行的两类解释是特征归因与反事实解释。这两类方法在很大程度上被独立研究,少数试图调和它们的尝试主要基于经验。本文建立了博弈论特征归因(重点但不限于SHAP)与反事实解释之间清晰的理论联系。在论证对基于Shapley值的特征归因和反事实解释进行操作性修改的必要性后,我们证明在特定条件下两者实际上是等价的。随后我们将等价性结果扩展至Shapley值之外的博弈论解概念。此外,通过分析等价性的条件,我们揭示了朴素使用反事实解释提供特征重要性的局限性。在三个数据集上的实验定量展示了这两种方法在联系各阶段的解释差异,并验证了理论发现。