Machine learning methods have significantly improved in their predictive capabilities, but at the same time they are becoming more complex and less transparent. As a result, explainers are often relied on to provide interpretability to these black-box prediction models. As crucial diagnostics tools, it is important that these explainers themselves are robust. In this paper we focus on one particular aspect of robustness, namely that an explainer should give similar explanations for similar data inputs. We formalize this notion by introducing and defining explainer astuteness, analogous to astuteness of prediction functions. Our formalism allows us to connect explainer robustness to the predictor's probabilistic Lipschitzness, which captures the probability of local smoothness of a function. We provide lower bound guarantees on the astuteness of a variety of explainers (e.g., SHAP, RISE, CXPlain) given the Lipschitzness of the prediction function. These theoretical results imply that locally smooth prediction functions lend themselves to locally robust explanations. We evaluate these results empirically on simulated as well as real datasets.
翻译:机器学习方法在预测能力上取得了显著进步,但同时也变得更加复杂和不透明。因此,解释器常被用来为这些黑箱预测模型提供可解释性。作为关键的诊断工具,解释器本身的鲁棒性至关重要。本文聚焦于鲁棒性的一个特定方面,即对于相似的数据输入,解释器应给出相似的解释。我们通过引入并定义解释器精明性(类似于预测函数的精明性)来形式化这一概念。该形式化方法使我们能够将解释器鲁棒性与预测器的概率Lipschitz性质联系起来,后者捕捉了函数局部平滑性的概率。我们基于预测函数的Lipschitz性质,为多种解释器(如SHAP、RISE、CXPlain)提供了关于精明性的下界保证。这些理论结果表明,局部平滑的预测函数有助于产生局部鲁棒的解释。我们通过模拟数据和真实数据集对这些结果进行了实证评估。