Recent developments in the field of explainable artificial intelligence (XAI) for vision models investigate the information extracted by their feature encoder. We contribute to this effort and propose Neuro-Activated Vision Explanations (NAVE), which extracts the information captured by the encoder by clustering the feature activations of the frozen network to be explained. The method does not aim to explain the model's prediction but to answer questions such as which parts of the image are processed similarly or which information is kept in deeper layers. Experimentally, we leverage NAVE to show that the training dataset and the level of supervision affect which concepts are captured. In addition, our method reveals the impact of registers on vision transformers (ViT) and the information saturation caused by the watermark Clever Hans effect in the training set.
翻译:可解释人工智能(XAI)领域针对视觉模型的最新研究致力于探究其特征编码器所提取的信息。我们为此方向作出贡献,提出神经激活视觉解释方法(NAVE),该方法通过对待解释冻结网络的特征激活进行聚类,以提取编码器捕获的信息。本方法并非旨在解释模型的预测结果,而是用于回答诸如"图像的哪些部分被相似地处理"或"深层网络中保留了何种信息"等问题。在实验中,我们运用NAVE方法证明:训练数据集与监督程度会影响模型捕获的概念类型。此外,我们的方法揭示了寄存器对视觉Transformer(ViT)的影响,以及训练集中由"聪明汉斯"水印效应导致的信息饱和现象。