Counterfactual explanations play an important role in detecting bias and improving the explainability of data-driven classification models. A counterfactual explanation (CE) is a minimal perturbed data point for which the decision of the model changes. Most of the existing methods can only provide one CE, which may not be achievable for the user. In this work we derive an iterative method to calculate robust CEs, i.e. CEs that remain valid even after the features are slightly perturbed. To this end, our method provides a whole region of CEs allowing the user to choose a suitable recourse to obtain a desired outcome. We use algorithmic ideas from robust optimization and prove convergence results for the most common machine learning methods including logistic regression, decision trees, random forests, and neural networks. Our experiments show that our method can efficiently generate globally optimal robust CEs for a variety of common data sets and classification models.
翻译:反事实解释在检测偏差和提升数据驱动分类模型的可解释性中起着重要作用。反事实解释(CE)是指使模型决策发生改变的最小扰动数据点。现有方法大多只能提供一个CE,而该CE对用户而言可能无法实现。本文提出一种迭代方法用于计算鲁棒CE,即即使在特征受到轻微扰动后仍保持有效的CE。为此,我们的方法能够生成整个CE区域,使用户能够选择合适的补救措施以实现期望结果。我们采用鲁棒优化中的算法思想,并针对包括逻辑回归、决策树、随机森林和神经网络在内的最常见机器学习方法证明了收敛性结果。实验表明,该方法能够高效地为多种常见数据集和分类模型生成全局最优的鲁棒CE。