Keep the Faith: Faithful Explanations in Convolutional Neural Networks for Case-Based Reasoning

Explaining predictions of black-box neural networks is crucial when applied to decision-critical tasks. Thus, attribution maps are commonly used to identify important image regions, despite prior work showing that humans prefer explanations based on similar examples. To this end, ProtoPNet learns a set of class-representative feature vectors (prototypes) for case-based reasoning. During inference, similarities of latent features to prototypes are linearly classified to form predictions and attribution maps are provided to explain the similarity. In this work, we evaluate whether architectures for case-based reasoning fulfill established axioms required for faithful explanations using the example of ProtoPNet. We show that such architectures allow the extraction of faithful explanations. However, we prove that the attribution maps used to explain the similarities violate the axioms. We propose a new procedure to extract explanations for trained ProtoPNets, named ProtoPFaith. Conceptually, these explanations are Shapley values, calculated on the similarity scores of each prototype. They allow to faithfully answer which prototypes are present in an unseen image and quantify each pixel's contribution to that presence, thereby complying with all axioms. The theoretical violations of ProtoPNet manifest in our experiments on three datasets (CUB-200-2011, Stanford Dogs, RSNA) and five architectures (ConvNet, ResNet, ResNet50, WideResNet50, ResNeXt50). Our experiments show a qualitative difference between the explanations given by ProtoPNet and ProtoPFaith. Additionally, we quantify the explanations with the Area Over the Perturbation Curve, on which ProtoPFaith outperforms ProtoPNet on all experiments by a factor $>10^3$.

翻译：在决策关键任务中，解释黑箱神经网络的预测至关重要。尽管先前研究表明人类更偏好基于相似案例的解释，但归因图仍被广泛用于识别重要图像区域。为此，ProtoPNet学习一组类别代表性特征向量（原型）以支持案例推理。推理过程中，潜在特征与原型的相似度通过线性分类形成预测，并提供归因图解释该相似度。本研究以ProtoPNet为例，评估案例推理架构是否满足忠实解释所需既定公理。我们证明此类架构可提取忠实解释，但揭示用于解释相似度的归因图违反了这些公理。我们提出名为ProtoPFaith的新方法，用于提取训练后ProtoPNet的解释。从概念上讲，这些解释是基于每个原型相似度分数计算的Shapley值，既能忠实回答未见图像中存在哪些原型，又可量化每个像素对该存在性的贡献，从而满足所有公理。在三个数据集（CUB-200-2011、Stanford Dogs、RSNA）和五种架构（ConvNet、ResNet、ResNet50、WideResNet50、ResNeXt50）上的实验表明，ProtoPNet的理论违规现象得到了验证。我们的实验显示ProtoPNet与ProtoPFaith给出的解释存在定性差异。此外，我们通过扰动曲线下面积量化解释，在所有实验中ProtoPFaith的性能均超过ProtoPNet达$>10^3$倍。