Counterfactual explanations have been a popular method of post-hoc explainability for a variety of settings in Machine Learning. Such methods focus on explaining classifiers by generating new data points that are similar to a given reference, while receiving a more desirable prediction. In this work, we investigate a framing for counterfactual generation methods that considers counterfactuals not as independent draws from a region around the reference, but as jointly sampled with the reference from the underlying data distribution. Through this framing, we derive a distance metric, tailored for counterfactual similarity that can be applied to a broad range of settings. Through both quantitative and qualitative analyses of counterfactual generation methods, we show that this framing allows us to express more nuanced dependencies among the covariates.
翻译:反事实解释已成为机器学习多种场景下流行的后验可解释性方法。此类方法通过生成与给定参考点相似但获得更理想预测结果的新数据点,专注于对分类器进行解释。在本研究中,我们提出了一种反事实生成框架,该框架将反事实视为与参考点从底层数据分布中联合采样的结果,而非围绕参考点区域的独立抽取。基于此框架,我们推导出一种适用于广泛场景的、专为反事实相似性定制的距离度量。通过对反事实生成方法的定量与定性分析,我们证明该框架能够表达协变量间更精细的依赖关系。