Explainable AI offers insights into what factors drive a certain prediction of a black-box AI system. One popular interpreting approach is through counterfactual explanations, which go beyond why a system arrives at a certain decision to further provide suggestions on what a user can do to alter the outcome. A counterfactual example must be satisfy various constraints to be useful for real-world applications. These constraints exist at trade-offs between one and another presenting radical challenges to existing works. To this end, we propose a stochastic learning-based framework that effectively balances the counterfactual trade-offs. The framework consists of a generation and a feature selection module with complementary roles: the former aims to model the distribution of valid counterfactuals whereas the latter serves to enforce additional constraints in a way that allows for differentiable training and amortized optimization. We demonstrate the effectiveness of our method in generating actionable and plausible counterfactuals that are more diverse than the existing methods and particularly more efficient than closest baselines.
翻译:可解释人工智能揭示了黑箱AI系统做出特定预测的驱动因素。一种流行的解释方法是通过反事实解释,其不仅解释系统为何得出某个决策,还进一步为用户提供如何改变结果的建议。反事实示例必须满足多重约束才能在现实应用中发挥作用,这些约束之间存在权衡关系,对现有研究构成了重大挑战。为此,我们提出一种基于随机学习的框架,能有效平衡反事实中的权衡关系。该框架由生成模块和特征选择模块组成,二者功能互补:生成模块旨在对有效反事实的分布进行建模,而特征选择模块则以可微分训练和摊销优化的方式施加额外约束。我们证明了该方法在生成可操作、合理的反事实方面的有效性,这些反事实比现有方法更多样化,且尤其比最接近的基线方法更高效。