Counterfactual explanations provide a popular method for analyzing the predictions of black-box systems, and they can offer the opportunity for computational recourse by suggesting actionable changes on how to change the input to obtain a different (i.e. more favorable) system output. However, recent work highlighted their vulnerability to different types of manipulations. This work studies the vulnerability of counterfactual explanations to data poisoning. We formalize data poisoning in the context of counterfactual explanations for increasing the cost of recourse on three different levels: locally for a single instance, or a sub-group of instances, or globally for all instances. We demonstrate that state-of-the-art counterfactual generation methods \& toolboxes are vulnerable to such data poisoning.
翻译:反事实解释是一种分析黑盒系统预测结果的常用方法,通过建议可操作的输入变更,帮助用户获得不同(即更有利的)系统输出,从而提供计算上的补救机会。然而,近期研究揭示了其对多种操纵方式的脆弱性。本研究探讨了反事实解释在数据投毒攻击下的脆弱性,并正式定义了面向反事实解释的数据投毒概念,旨在从三个层面提升补救成本:针对单个样本的局部层面、针对样本子群的群体层面,以及针对全体样本的全局层面。实验证明,当前主流的反事实生成方法与工具库均易受此类数据投毒攻击的影响。