Accurate Explanation Model for Image Classifiers using Class Association Embedding

Image classification is a primary task in data analysis where explainable models are crucially demanded in various applications. Although amounts of methods have been proposed to obtain explainable knowledge from the black-box classifiers, these approaches lack the efficiency of extracting global knowledge regarding the classification task, thus is vulnerable to local traps and often leads to poor accuracy. In this study, we propose a generative explanation model that combines the advantages of global and local knowledge for explaining image classifiers. We develop a representation learning method called class association embedding (CAE), which encodes each sample into a pair of separated class-associated and individual codes. Recombining the individual code of a given sample with altered class-associated code leads to a synthetic real-looking sample with preserved individual characters but modified class-associated features and possibly flipped class assignments. A building-block coherency feature extraction algorithm is proposed that efficiently separates class-associated features from individual ones. The extracted feature space forms a low-dimensional manifold that visualizes the classification decision patterns. Explanation on each individual sample can be then achieved in a counter-factual generation manner which continuously modifies the sample in one direction, by shifting its class-associated code along a guided path, until its classification outcome is changed. We compare our method with state-of-the-art ones on explaining image classification tasks in the form of saliency maps, demonstrating that our method achieves higher accuracies. The code is available at https://github.com/xrt11/XAI-CODE.

翻译：图像分类是数据分析中的核心任务，各类应用场景对可解释模型的需求日益迫切。尽管已有大量方法被提出用于从黑盒分类器中获取可解释知识，但这些方法在提取分类任务全局知识方面效率不足，容易陷入局部陷阱并导致解释准确性低下。本研究提出一种生成式解释模型，融合全局与局部知识的优势以解释图像分类器。我们开发了一种称为类别关联嵌入（CAE）的表征学习方法，该方法将每个样本编码为一对分离的类别关联码与个体码。将给定样本的个体码与修改后的类别关联码重新组合，可生成保留个体特征但改变类别关联属性、甚至可能翻转分类结果的逼真合成样本。我们提出了一种模块化相干特征提取算法，能有效分离类别关联特征与个体特征。提取的特征空间构成低维流形，可可视化分类决策模式。对单个样本的解释可通过反事实生成方式实现：沿引导路径持续平移样本的类别关联码，使其在特定方向上连续变化，直至其分类结果发生改变。我们以显著图形式将本方法与前沿方法在图像分类任务解释方面进行比较，证明本方法能达到更高的解释准确率。代码发布于 https://github.com/xrt11/XAI-CODE。