As machine learning models are increasingly being employed in various high-stakes settings, it becomes important to ensure that predictions of these models are not only adversarially robust, but also readily explainable to relevant stakeholders. However, it is unclear if these two notions can be simultaneously achieved or if there exist trade-offs between them. In this work, we make one of the first attempts at studying the impact of adversarially robust models on actionable explanations which provide end users with a means for recourse. We theoretically and empirically analyze the cost (ease of implementation) and validity (probability of obtaining a positive model prediction) of recourses output by state-of-the-art algorithms when the underlying models are adversarially robust vs. non-robust. More specifically, we derive theoretical bounds on the differences between the cost and the validity of the recourses generated by state-of-the-art algorithms for adversarially robust vs. non-robust linear and non-linear models. Our empirical results with multiple real-world datasets validate our theoretical results and show the impact of varying degrees of model robustness on the cost and validity of the resulting recourses. Our analyses demonstrate that adversarially robust models significantly increase the cost and reduce the validity of the resulting recourses, thus shedding light on the inherent trade-offs between adversarial robustness and actionable explanations
翻译:随着机器学习模型越来越多地应用于各种高风险场景,确保这些模型的预测不仅具有对抗鲁棒性,而且能够易于向相关利益方解释变得至关重要。然而,这两种概念能否同时实现,或者它们之间是否存在权衡,目前尚不清楚。在本工作中,我们首次尝试研究对抗鲁棒性模型对可操作解释的影响,后者为最终用户提供了获得补救措施的手段。我们从理论上和实证上分析了当底层模型具有对抗鲁棒性而非鲁棒性时,最先进算法输出的补救措施的成本(实施难易程度)和有效性(获得正模型预测的概率)。更具体地说,我们推导了最先进算法为对抗鲁棒性与非鲁棒性线性和非线性模型生成的补救措施之间的成本与有效性差异的理论界限。我们利用多个真实世界数据集进行的实证结果验证了我们的理论结果,并展示了不同模型鲁棒性程度对最终补救措施成本和有效性的影响。我们的分析表明,对抗鲁棒性模型显著增加了补救措施的成本并降低了其有效性,从而揭示了对抗鲁棒性与可操作解释之间固有的权衡关系。