Interpretability methods are critical components for examining and exploring deep neural networks (DNN), as well as increasing our understanding of and trust in them. Vision transformers (ViT), which can be trained to state-of-the-art performance with a self-supervised learning (SSL) training method, provide built-in attention maps (AM). While AMs can provide high-quality semantic segmentation of input images, they do not account for any signal coming from a downstream classifier. We introduce class-discriminative attention maps (CDAM), a novel post-hoc explanation method that is highly sensitive to the target class. Our method essentially scales attention scores by how relevant the corresponding tokens are for the predictions of a classifier head. Alternative to classifier outputs, CDAM can also explain a user-defined concept by targeting similarity measures in the latent space of the ViT. This allows for explanations of arbitrary concepts, defined by the user through a few sample images. We investigate the operating characteristics of CDAM in comparison with relevance propagation (RP) and token ablation maps (TAM), an alternative to pixel occlusion methods. CDAM is highly class-discriminative and semantically relevant, while providing implicit regularization of relevance scores. PyTorch implementation: \url{https://github.com/lenbrocki/CDAM} Web live demo: \url{https://cdam.informatism.com/}
翻译:可解释性方法是检查和探索深度神经网络(DNN)的关键组成部分,同时也有助于增强我们对其的理解和信任。视觉Transformer(ViT)可通过自监督学习(SSL)训练方法达到最先进性能,并提供内置注意力图(AM)。尽管AM能够对输入图像进行高质量的语义分割,但它们并未考虑来自下游分类器的任何信号。本文提出类别区分注意力图(CDAM),这是一种新颖的事后解释方法,对目标类别具有高度敏感性。该方法本质上是根据对应标记对分类头预测的相关性来缩放注意力分数。作为分类器输出的替代方案,CDAM还可通过针对ViT潜在空间中的相似性度量来解释用户定义的概念,从而允许用户通过少量样本图像定义任意概念并对其进行解释。我们研究了CDAM与相关性传播(RP)及令牌消融图(TAM,像素遮挡方法的替代方案)相比的工作特性。CDAM具有高度的类别区分性和语义相关性,同时提供了相关性分数的隐式正则化。PyTorch实现:\url{https://github.com/lenbrocki/CDAM} 在线演示:\url{https://cdam.informatism.com/}