In this paper, we introduce a new approach, called Posthoc Interpretation via Quantization (PIQ), for interpreting decisions made by trained classifiers. Our method utilizes vector quantization to transform the representations of a classifier into a discrete, class-specific latent space. The class-specific codebooks act as a bottleneck that forces the interpreter to focus on the parts of the input data deemed relevant by the classifier for making a prediction. Our model formulation also enables learning concepts by incorporating the supervision of pretrained annotation models such as state-of-the-art image segmentation models. We evaluated our method through quantitative and qualitative studies involving black-and-white images, color images, and audio. As a result of these studies we found that PIQ generates interpretations that are more easily understood by participants to our user studies when compared to several other interpretation methods in the literature.
翻译:本文提出了一种名为“基于量化的后验解释”(PIQ)的新方法,用于解释已训练分类器的决策。该方法利用向量量化将分类器的表示转换为离散的、类别特定的潜在空间。类别特定码本作为瓶颈,迫使解释器聚焦于分类器认为与预测相关的输入数据部分。我们的模型框架还通过整合预训练标注模型(如最先进的图像分割模型)的监督信号,实现了概念的学习。我们通过涉及黑白图像、彩色图像及音频的定量与定性研究评估了该方法。研究结果表明,与文献中其他几种解释方法相比,PIQ生成的解释在用户研究中更易于参与者理解。