Convolutional Neural Networks (CNNs) have shown remarkable performance in image classification. However, interpreting their predictions is challenging due to the size and complexity of these models. State-of-the-art saliency methods generate local explanations highlighting the area in the input image where a class is identified but cannot explain how a concept of interest contributes to the prediction. On the other hand, concept-based methods, such as TCAV, provide insights into how sensitive the network is to a human-defined concept but cannot compute its attribution in a specific prediction nor show its location within the input image. We introduce Visual-TCAV, a novel explainability framework aiming to bridge the gap between these methods by providing both local and global explanations. Visual-TCAV uses Concept Activation Vectors (CAVs) to generate class-agnostic saliency maps that show where the network recognizes a certain concept. Moreover, it can estimate the attribution of these concepts to the output of any class using a generalization of Integrated Gradients. We evaluate the method's faithfulness via a controlled experiment where the ground truth for explanations is known, showing better ground truth alignment than TCAV. Our code is available at https://github.com/DataSciencePolimi/Visual-TCAV.
翻译:卷积神经网络(CNN)在图像分类任务中展现出卓越性能。然而,由于这些模型的规模与复杂性,其预测结果的可解释性面临挑战。现有最先进的显著性方法能够生成局部解释,突出输入图像中识别出类别的区域,但无法解释某一目标概念对预测的贡献。另一方面,基于概念的方法(如TCAV)可揭示网络对人工定义概念的敏感度,但无法计算该概念在特定预测中的归因值,也无法定位其在输入图像中的位置。我们提出Visual-TCAV——一种新颖的可解释性框架,旨在通过同时提供局部与全局解释来弥合上述方法的鸿沟。Visual-TCAV利用概念激活向量(CAV)生成类别无关的显著性图,以显示网络识别特定概念的位置。此外,该方法通过积分梯度的推广形式,可估计这些概念对任意类别输出的归因贡献。我们通过已知解释真值的受控实验评估了该方法的忠实度,结果表明其真值对齐效果优于TCAV。代码开源地址:https://github.com/DataSciencePolimi/Visual-TCAV