We propose a permutation-based explanation method for image classifiers. Current image-model explanations like activation maps are limited to instance-based explanations in the pixel space, making it difficult to understand global model behavior. In contrast, permutation based explanations for tabular data classifiers measure feature importance by comparing model performance on data before and after permuting a feature. We propose an explanation method for image-based models that permutes interpretable concepts across dataset images. Given a dataset of images labeled with specific concepts like captions, we permute a concept across examples in the text space and then generate images via a text-conditioned diffusion model. Feature importance is then reflected by the change in model performance relative to unpermuted data. When applied to a set of concepts, the method generates a ranking of feature importance. We show this approach recovers underlying model feature importance on synthetic and real-world image classification tasks.
翻译:我们提出一种基于置换的图像分类器解释方法。当前针对图像模型的解释方法(如激活图)仅限于像素空间中的实例级解释,难以理解模型的全局行为。相比之下,基于置换的表格数据分类器解释方法通过比较特征置换前后模型在数据上的性能来衡量特征重要性。我们提出一种面向图像模型的解释方法,该方法在数据集图像间对可解释概念进行置换。给定一个标注了特定概念(如描述性标题)的图像数据集,我们在文本空间中对概念进行跨样本置换,随后通过文本条件扩散模型生成图像。特征重要性则通过模型性能相对于未置换数据的变化来反映。当应用于一组概念时,该方法可生成特征重要性排序。我们通过合成及真实世界图像分类任务证明,该方法能够有效还原模型潜在的特征重要性。