Counterfactual Explanation for Regression via Disentanglement in Latent Space

Counterfactual Explanations (CEs) help address the question: How can the factors that influence the prediction of a predictive model be changed to achieve a more favorable outcome from a user's perspective? Thus, they bear the potential to guide the user's interaction with AI systems since they represent easy-to-understand explanations. To be applicable, CEs need to be realistic and actionable. In the literature, various methods have been proposed to generate CEs. However, the majority of research on CEs focuses on classification problems where questions like "What should I do to get my rejected loan approved?" are raised. In practice, answering questions like "What should I do to increase my salary?" are of a more regressive nature. In this paper, we introduce a novel method to generate CEs for a pre-trained regressor by first disentangling the label-relevant from the label-irrelevant dimensions in the latent space. CEs are then generated by combining the label-irrelevant dimensions and the predefined output. The intuition behind this approach is that the ideal counterfactual search should focus on the label-irrelevant characteristics of the input and suggest changes toward target-relevant characteristics. Searching in the latent space could help achieve this goal. We show that our method maintains the characteristics of the query sample during the counterfactual search. In various experiments, we demonstrate that the proposed method is competitive based on different quality measures on image and tabular datasets in regression problem settings. It efficiently returns results closer to the original data manifold compared to three state-of-the-art methods, which is essential for realistic high-dimensional machine learning applications. Our code will be made available as an open-source package upon the publication of this work.

翻译：反事实解释有助于回答以下问题：如何改变影响预测模型的因素，以从用户角度获得更有利的结果？因此，它们具有指导用户与人工智能系统交互的潜力，因为其提供了易于理解的解释。为了具备实用性，反事实解释需要具有现实性和可操作性。文献中已提出多种生成反事实解释的方法。然而，大多数反事实解释研究聚焦于分类问题，例如"我该怎么做才能让被拒的贷款获批？"这类问题。在实践中，回答"我该怎么做才能提高工资？"这类问题本质上更偏向回归性质。本文提出了一种为预训练回归模型生成反事实解释的新方法：首先在潜空间中分离与标签相关和无关的维度，然后通过组合标签无关维度与预设输出来生成反事实解释。该方法的直觉在于：理想的反事实搜索应聚焦于输入中与标签无关的特征，并建议向与目标相关的特征方向调整。在潜空间中进行搜索有助于实现这一目标。我们证明该方法能在反事实搜索过程中保持查询样本的特征。通过多项实验，我们证明所提方法在回归问题设置下，基于图像和表格数据集的多种质量指标均具有竞争力。与三种最先进方法相比，该方法能高效返回更接近原始数据流形的结果，这对高维机器学习应用的实际落地至关重要。本工作发表后，我们的代码将以开源包的形式发布。