Importance estimators are explainability methods that quantify feature importance for deep neural networks (DNN). In vision transformers (ViT), the self-attention mechanism naturally leads to attention maps, which are sometimes used as importance scores for which input features ViT models are focusing on. However, attention maps do not account for signals from downstream tasks. To generate explanations that are sensitive to downstream tasks, we have developed class-discriminative attention maps (CDAM), a gradient-based extension that estimates feature importance with respect to a known class or a latent concept. CDAM scales attention scores by how relevant the corresponding tokens are for the predictions of a classifier head. In addition to targeting the supervised classifier, CDAM can explain an arbitrary concept shared by selected samples by measuring similarity in the latent space of ViT. Additionally, we introduce Smooth CDAM and Integrated CDAM, which average a series of CDAMs with slightly altered tokens. Our quantitative benchmarks include correctness, compactness, and class sensitivity, in comparison to six other importance estimators. Vanilla, Smooth, and Integrated CDAM excel across all three benchmarks. In particular, our results suggest that existing importance estimators may not provide sufficient class-sensitivity. We demonstrate the utility of CDAM in medical images by training and explaining malignancy and biomarker prediction models based on lung Computed Tomography (CT) scans. Overall, CDAM is shown to be highly class-discriminative and semantically relevant, while providing compact explanations.
翻译:重要性估计器是用于量化深度神经网络(DNN)中特征重要性的可解释性方法。在视觉Transformer(ViT)中,自注意力机制自然产生注意力图,这些注意力图有时被用作ViT模型所关注输入特征的重要性分数。然而,注意力图并未考虑下游任务的信号。为了生成对下游任务敏感的解释,我们开发了类别判别注意力图(CDAM),这是一种基于梯度的扩展方法,用于估计特征相对于已知类别或潜在概念的重要性。CDAM通过衡量对应标记对分类器头部预测的相关性来缩放注意力分数。除了针对监督分类器,CDAM还可以通过测量ViT潜在空间中选定样本之间的相似性,来解释任意共享概念。此外,我们引入了平滑CDAM和积分CDAM,它们通过对一系列标记轻微扰动后的CDAM进行平均来计算。我们的定量基准测试包括正确性、紧凑性和类别敏感性,并与六种其他重要性估计器进行比较。原始、平滑和积分CDAM在所有三项基准测试中均表现优异。特别地,我们的结果表明现有重要性估计器可能无法提供足够的类别敏感性。我们通过在肺部计算机断层扫描(CT)图像上训练并解释恶性肿瘤和生物标志物预测模型,展示了CDAM在医学图像中的应用价值。总体而言,CDAM被证明具有高度类别判别性和语义相关性,同时提供紧凑的解释。