We present PPCEF, a novel method for generating probabilistically plausible counterfactual explanations (CFs). PPCEF advances beyond existing methods by combining a probabilistic formulation that leverages the data distribution with the optimization of plausibility within a unified framework. Compared to reference approaches, our method enforces plausibility by directly optimizing the explicit density function without assuming a particular family of parametrized distributions. This ensures CFs are not only valid (i.e., achieve class change) but also align with the underlying data's probability density. For that purpose, our approach leverages normalizing flows as powerful density estimators to capture the complex high-dimensional data distribution. Furthermore, we introduce a novel loss that balances the trade-off between achieving class change and maintaining closeness to the original instance while also incorporating a probabilistic plausibility term. PPCEF's unconstrained formulation allows for efficient gradient-based optimization with batch processing, leading to orders of magnitude faster computation compared to prior methods. Moreover, the unconstrained formulation of PPCEF allows for the seamless integration of future constraints tailored to specific counterfactual properties. Finally, extensive evaluations demonstrate PPCEF's superiority in generating high-quality, probabilistically plausible counterfactual explanations in high-dimensional tabular settings. This makes PPCEF a powerful tool for not only interpreting complex machine learning models but also for improving fairness, accountability, and trust in AI systems.
翻译:我们提出了一种新颖的方法PPCEF,用于生成概率可信的反事实解释。PPCEF通过将利用数据分布的概率化表述与可信度优化结合在一个统一框架中,超越了现有方法。与参考方法相比,我们的方法通过直接优化显式密度函数来强制实现可信度,而无需假设特定的参数化分布族。这确保了反事实解释不仅有效(即实现类别改变),而且与底层数据的概率密度保持一致。为此,我们的方法利用归一化流作为强大的密度估计器来捕获复杂的高维数据分布。此外,我们引入了一种新颖的损失函数,该函数在实现类别改变和保持与原始实例的接近度之间进行权衡,同时还包含了概率可信度项。PPCEF的无约束公式化允许使用批处理进行高效的基于梯度的优化,从而相比先前方法实现了数量级更快的计算速度。此外,PPCEF的无约束公式化允许无缝集成针对特定反事实属性定制的未来约束。最后,大量评估表明,PPCEF在高维表格数据设置中生成高质量、概率可信的反事实解释方面具有优越性。这使得PPCEF不仅成为解释复杂机器学习模型的强大工具,还能提高人工智能系统的公平性、可问责性和可信度。