Counterfactual explanations (CEs) offer interpretable insights into machine learning predictions by answering ``what if?" questions. However, in real-world settings where models are frequently updated, existing counterfactual explanations can quickly become invalid or unreliable. In this paper, we introduce Probabilistically Safe CEs (PSCE), a method for generating counterfactual explanations that are $δ$-safe, to ensure high predictive confidence, and $ε$-robust to ensure low predictive variance. Based on Bayesian principles, PSCE provides formal probabilistic guarantees for CEs under model changes which are adhered to in what we refer to as the $\langle δ, ε\rangle$-set. Uncertainty-aware constraints are integrated into our optimization framework and we validate our method empirically across diverse datasets. We compare our approach against state-of-the-art Bayesian CE methods, where PSCE produces counterfactual explanations that are not only more plausible and discriminative, but also provably robust under model change.
翻译:反事实解释通过回答“如果...会怎样?”的问题,为机器学习预测提供可解释的见解。然而,在实际应用中模型频繁更新的场景下,现有的反事实解释可能迅速失效或变得不可靠。本文提出概率安全反事实解释方法,该方法生成的反事实解释具有$δ$-安全性以确保高预测置信度,以及$ε$-鲁棒性以确保低预测方差。基于贝叶斯原理,PSCE为模型变更下的反事实解释提供了形式化的概率保证,该保证在我们所称的$\langle δ, ε\rangle$集合中成立。我们将不确定性感知约束集成到优化框架中,并在多个数据集上进行了实证验证。通过与最先进的贝叶斯反事实解释方法进行比较,PSCE生成的反事实解释不仅具有更高的合理性和判别性,而且在模型变更下具有可证明的鲁棒性。