Counterfactual explanations provide a popular method for analyzing the predictions of black-box systems, and they can offer the opportunity for computational recourse by suggesting actionable changes on how to change the input to obtain a different (i.e.\ more favorable) system output. However, recent work highlighted their vulnerability to different types of manipulations. This work studies the vulnerability of counterfactual explanations to data poisoning. We formally introduce and investigate data poisoning in the context of counterfactual explanations for increasing the cost of recourse on three different levels: locally for a single instance, or a sub-group of instances, or globally for all instances. In this context, we characterize and prove the correctness of several different data poisonings. We also empirically demonstrate that state-of-the-art counterfactual generation methods and toolboxes are vulnerable to such data poisoning.
翻译:反事实解释提供了一种分析黑箱系统预测的流行方法,通过建议可操作的输入变更以获得不同(即更有利)的系统输出,从而为计算性补救提供机会。然而,近期研究凸显了它们对不同类型操纵的脆弱性。本文研究了反事实解释对数据投毒的脆弱性。我们正式引入并探讨了在反事实解释语境下,为在三个不同层面上增加补救成本而进行的数据投毒:局部针对单个实例、子组实例或全局针对所有实例。在此语境下,我们刻画并证明了若干不同数据投毒方法的正确性。我们还通过实验表明,最先进的反事实生成方法和工具包容易受到此类数据投毒的影响。