Counterfactual explanations (CFEs) exemplify how to minimally modify a feature vector to achieve a different prediction for an instance. CFEs can enhance informational fairness and trustworthiness, and provide suggestions for users who receive adverse predictions. However, recent research has shown that multiple CFEs can be offered for the same instance or instances with slight differences. Multiple CFEs provide flexible choices and cover diverse desiderata for user selection. However, individual fairness and model reliability will be damaged if unstable CFEs with different costs are returned. Existing methods fail to exploit flexibility and address the concerns of non-robustness simultaneously. To address these issues, we propose a conceptually simple yet effective solution named Counterfactual Explanations with Minimal Satisfiable Perturbations (CEMSP). Specifically, CEMSP constrains changing values of abnormal features with the help of their semantically meaningful normal ranges. For efficiency, we model the problem as a Boolean satisfiability problem to modify as few features as possible. Additionally, CEMSP is a general framework and can easily accommodate more practical requirements, e.g., casualty and actionability. Compared to existing methods, we conduct comprehensive experiments on both synthetic and real-world datasets to demonstrate that our method provides more robust explanations while preserving flexibility.
翻译:反事实解释(CFEs)通过最小限度修改特征向量来展示如何对某个实例获得不同的预测结果。CFEs能够增强信息公平性和可信度,并为收到负面预测的用户提供建议。然而,近期研究表明,同一实例或存在细微差异的实例可能生成多个CFEs。多个CFEs可提供灵活的选择,覆盖用户多样化的需求。但若返回具有不同代价的不稳定CFEs,则会损害个体公平性和模型可靠性。现有方法无法同时兼顾灵活性并解决非鲁棒性问题。针对这些问题,我们提出一种概念简单但有效的解决方案——基于最小可满足扰动的反事实解释(CEMSP)。具体而言,CEMSP借助异常特征具有语义含义的正常取值范围约束其变化值。为提升效率,我们将该问题建模为布尔可满足性问题,以尽可能少地修改特征。此外,CEMSP是一个通用框架,可轻松兼容更多实际需求(如因果性和可操作性)。与现有方法相比,我们在合成数据集和真实数据集上进行了全面实验,证明我们的方法在保持灵活性的同时提供了更鲁棒的解释。