Machine learning methods have significantly improved in their predictive capabilities, but at the same time they are becoming more complex and less transparent. As a result, explainers are often relied on to provide interpretability to these black-box prediction models. As crucial diagnostics tools, it is important that these explainers themselves are robust. In this paper we focus on one particular aspect of robustness, namely that an explainer should give similar explanations for similar data inputs. We formalize this notion by introducing and defining explainer astuteness, analogous to astuteness of prediction functions. Our formalism allows us to connect explainer robustness to the predictor's probabilistic Lipschitzness, which captures the probability of local smoothness of a function. We provide lower bound guarantees on the astuteness of a variety of explainers (e.g., SHAP, RISE, CXPlain) given the Lipschitzness of the prediction function. These theoretical results imply that locally smooth prediction functions lend themselves to locally robust explanations. We evaluate these results empirically on simulated as well as real datasets.
翻译:机器学习方法在预测能力方面取得了显著提升,但与此同时,它们也变得更加复杂且不透明。因此,解释器常被依赖以提供这些黑箱预测模型的可解释性。作为关键的诊断工具,解释器本身的鲁棒性至关重要。本文聚焦于鲁棒性的一个特定方面,即解释器应针对相似的数据输入给出相似的解释。我们通过引入并定义"解释器敏锐度"(analogous to astuteness of prediction functions)来形式化这一概念。该形式化框架使我们能够将解释器鲁棒性与预测器的概率Lipschitz性质联系起来,该性质刻画了函数局部光滑性的概率。我们根据预测函数的Lipschitz性质,为多种解释器(如SHAP、RISE、CXPlain)的敏锐度提供了下界保证。这些理论结果表明,局部光滑的预测函数自然适用于局部鲁棒的解释。我们通过模拟数据与真实数据集对这些结果进行了实证评估。