Despite excellent performance of deep neural networks (DNNs) in image classification, detection, and prediction, characterizing how DNNs make a given decision remains an open problem, resulting in a number of interpretability methods. Post-hoc interpretability methods primarily aim to quantify the importance of input features with respect to the class probabilities. However, due to the lack of ground truth and the existence of interpretability methods with diverse operating characteristics, evaluating these methods is a crucial challenge. A popular approach to evaluate interpretability methods is to perturb input features deemed important for a given prediction and observe the decrease in accuracy. However, perturbation itself may introduce artifacts. We propose a method for estimating the impact of such artifacts on the fidelity estimation by utilizing model accuracy curves from perturbing input features according to the Most Import First (MIF) and Least Import First (LIF) orders. Using the ResNet-50 trained on the ImageNet, we demonstrate the proposed fidelity estimation of four popular post-hoc interpretability methods.
翻译:尽管深度神经网络(DNNs)在图像分类、检测和预测方面表现出色,但刻画DNNs如何做出特定决策仍是一个开放性问题,由此催生了多种可解释性方法。事后可解释性方法主要旨在量化输入特征相对于类别概率的重要性。然而,由于缺乏基准真相以及存在具有不同操作特性的可解释性方法,评估这些方法成为一项关键挑战。评估可解释性方法的一种流行方法是扰动那些被认为对给定预测重要的输入特征,并观察准确率的下降。然而,扰动本身可能引入伪影。我们提出了一种方法,通过利用依据“最重要优先”(MIF)和“最不重要优先”(LIF)顺序扰动输入特征所得的模型准确率曲线,来估计此类伪影对保真度估计的影响。利用在ImageNet上训练的ResNet-50模型,我们展示了针对四种流行事后可解释性方法的所提保真度估计。