Enhancing Counterfactual Explanation Search with Diffusion Distance and Directional Coherence

from arxiv, This work has been accepted to be presented to The 2nd World Conference on eXplainable Artificial Intelligence (xAI 2024), July 17-19, 2024 - Valletta, Malta

A pressing issue in the adoption of AI models is the increasing demand for more human-centric explanations of their predictions. To advance towards more human-centric explanations, understanding how humans produce and select explanations has been beneficial. In this work, inspired by insights of human cognition we propose and test the incorporation of two novel biases to enhance the search for effective counterfactual explanations. Central to our methodology is the application of diffusion distance, which emphasizes data connectivity and actionability in the search for feasible counterfactual explanations. In particular, diffusion distance effectively weights more those points that are more interconnected by numerous short-length paths. This approach brings closely connected points nearer to each other, identifying a feasible path between them. We also introduce a directional coherence term that allows the expression of a preference for the alignment between the joint and marginal directional changes in feature space to reach a counterfactual. This term enables the generation of counterfactual explanations that align with a set of marginal predictions based on expectations of how the outcome of the model varies by changing one feature at a time. We evaluate our method, named Coherent Directional Counterfactual Explainer (CoDiCE), and the impact of the two novel biases against existing methods such as DiCE, FACE, Prototypes, and Growing Spheres. Through a series of ablation experiments on both synthetic and real datasets with continuous and mixed-type features, we demonstrate the effectiveness of our method.

翻译：人工智能模型应用中的一个紧迫问题是，人们对其预测结果日益需要更以人为本的解释。为推进这一目标，理解人类如何生成和选择解释具有重要价值。本文受人类认知洞察启发，提出并测试了两种新型偏见的融入，以提升有效反事实解释的搜索效率。方法的核心是应用扩散距离，该距离在搜索可行反事实解释时强调数据连通性与可操作性。具体而言，扩散距离有效加权了那些通过多条短路径高度互联的数据点，使紧密相连的点相互靠近，从而识别出它们之间的可行路径。我们还引入方向连贯性项，用于表达对特征空间中联合与边际方向变化对齐的偏好，以达成反事实解释。该项能够生成与一组基于单变量特征变化对模型输出影响预期的边际预测相一致的反事实解释。我们对所提出的方法Coherent Directional Counterfactual Explainer（CoDiCE）进行评价，并通过消融实验在包含连续型与混合型特征的合成数据集与真实数据集上，与DiCE、FACE、原型以及Growing Spheres等现有方法进行对比，验证了该方法的有效性。