Due to the absence of ground truth, objective evaluation of explainability methods is an essential research direction. So far, the vast majority of evaluations can be summarized into three categories, namely human evaluation, sensitivity testing, and salinity check. This work proposes a novel evaluation methodology from the perspective of generalizability. We employ an Autoencoder to learn the distributions of the generated explanations and observe their learnability as well as the plausibility of the learned distributional features. We first briefly demonstrate the evaluation idea of the proposed approach at LIME, and then quantitatively evaluate multiple popular explainability methods. We also find that smoothing the explanations with SmoothGrad can significantly enhance the generalizability of explanations.
翻译:由于缺乏真实基准,可解释性方法的客观评估是一个重要的研究方向。迄今为止,绝大多数评估可归纳为三类:人工评估、敏感性测试和显著性检验。本文从泛化性的角度提出了一种新的评估方法。我们采用自编码器学习所生成解释的分布,并观察其可学习性以及所学分布特征的可解释性。首先以LIME为例简要展示所提方法的评估思路,随后对多种主流可解释性方法进行定量评估。我们还发现,使用SmoothGrad对解释进行平滑处理可显著提升解释的泛化性。