Counterfactual explanations provide a popular method for analyzing the predictions of black-box systems, and they can offer the opportunity for computational recourse by suggesting actionable changes on how to change the input to obtain a different (i.e. more favorable) system output. However, recent work highlighted their vulnerability to different types of manipulations. This work studies the vulnerability of counterfactual explanations to data poisoning. We formalize data poisoning in the context of counterfactual explanations for increasing the cost of recourse on three different levels: locally for a single instance, or a sub-group of instances, or globally for all instances. We demonstrate that state-of-the-art counterfactual generation methods \& toolboxes are vulnerable to such data poisoning.
翻译:反事实解释提供了一种分析黑盒系统预测的流行方法,通过建议对输入进行可操作的更改(例如获得更有利的系统输出),为计算性补救提供了机会。然而,近期研究揭示了其对不同类型操纵的脆弱性。本文研究了反事实解释对数据投毒的脆弱性。我们形式化了数据投毒在反事实解释中的影响,旨在从三个层面增加补救成本:局部层面(针对单个实例)、子组层面(针对实例子组)或全局层面(针对所有实例)。我们证明,当前最先进的反事实生成方法与工具箱易受此类数据投毒攻击。