Convolutional Neural Networks (CNNs) have seen significant performance improvements in recent years. However, due to their size and complexity, they function as black-boxes, leading to transparency concerns. State-of-the-art saliency methods generate local explanations that highlight the area in the input image where a class is identified but cannot explain how a concept of interest contributes to the prediction, which is essential for bias mitigation. On the other hand, concept-based methods, such as TCAV (Testing with Concept Activation Vectors), provide insights into how sensitive is the network to a concept, but cannot compute its attribution in a specific prediction nor show its location within the input image. This paper introduces a novel post-hoc explainability framework, Visual-TCAV, which aims to bridge the gap between these methods by providing both local and global explanations for CNN-based image classification. Visual-TCAV uses Concept Activation Vectors (CAVs) to generate saliency maps that show where concepts are recognized by the network. Moreover, it can estimate the attribution of these concepts to the output of any class using a generalization of Integrated Gradients. This framework is evaluated on popular CNN architectures, with its validity further confirmed via experiments where ground truth for explanations is known, and a comparison with TCAV. Our code will be made available soon.
翻译:近年来,卷积神经网络(CNNs)在性能上取得了显著提升。然而,由于其规模和复杂性,它们通常被视为黑盒模型,引发了透明度方面的担忧。现有的显著性方法能够生成局部解释,突出显示输入图像中识别出类别的区域,但无法解释感兴趣的概念如何影响预测结果——这对于缓解模型偏差至关重要。另一方面,基于概念的方法(如TCAV)能够揭示网络对特定概念的敏感程度,但无法计算该概念在具体预测中的归因贡献,也无法在输入图像中定位其影响区域。本文提出了一种新颖的事后可解释性框架——视觉TCAV,旨在通过为基于CNN的图像分类提供局部与全局解释来弥合这两类方法的差距。视觉TCAV利用概念激活向量(CAVs)生成显著性映射,直观展示网络识别概念的空间位置。此外,通过推广积分梯度方法,该框架能够量化这些概念对任意类别输出的归因贡献。我们在主流CNN架构上对该框架进行了评估,并通过已知解释真实值的实验以及与TCAV的对比研究进一步验证了其有效性。相关代码即将公开。