Recent work has connected adversarial attack methods and algorithmic recourse methods: both seek minimal changes to an input instance which alter a model's classification decision. It has been shown that traditional adversarial training, which seeks to minimize a classifier's susceptibility to malicious perturbations, increases the cost of generated recourse; with larger adversarial training radii correlating with higher recourse costs. From the perspective of algorithmic recourse, however, the appropriate adversarial training radius has always been unknown. Another recent line of work has motivated adversarial training with adaptive training radii to address the issue of instance-wise variable adversarial vulnerability, showing success in domains with unknown attack radii. This work studies the effects of adaptive adversarial training on algorithmic recourse costs. We establish that the improvements in model robustness induced by adaptive adversarial training show little effect on algorithmic recourse costs, providing a potential avenue for affordable robustness in domains where recoursability is critical.
翻译:近期工作将对抗攻击方法与算法可解释性方法联系起来:两者都寻求对输入实例进行最小化修改以改变模型的分类决策。已有研究表明,旨在最小化分类器对恶意扰动敏感性的传统对抗训练,会增加生成可解释性的成本;且更大的对抗训练半径与更高的可解释性成本相关。然而,从算法可解释性角度看,合适的对抗训练半径始终未知。另一项近期工作提出了具有自适应训练半径的对抗训练,以解决实例间可变的对抗脆弱性问题,并在攻击半径未知的领域取得了成功。本研究探讨了自适应对抗训练对算法可解释性成本的影响。我们证明,自适应对抗训练带来的模型鲁棒性提升对算法可解释性成本几乎没有影响,为在可解释性至关重要的领域实现经济实惠的鲁棒性提供了潜在途径。