In this paper, we introduce a new approach, called "Posthoc Interpretation via Quantization (PIQ)", for interpreting decisions made by trained classifiers. Our method utilizes vector quantization to transform the representations of a classifier into a discrete, class-specific latent space. The class-specific codebooks act as a bottleneck that forces the interpreter to focus on the parts of the input data deemed relevant by the classifier for making a prediction. We evaluated our method through quantitative and qualitative studies and found that PIQ generates interpretations that are more easily understood by participants to our user studies when compared to several other interpretation methods in the literature.
翻译:本文提出了一种名为“基于量化的后验解释(PIQ)”的新方法,用于解释已训练分类器做出的决策。我们的方法利用向量量化将分类器的表示转换为离散的、类别特定的潜在空间。类别特定的码本充当瓶颈,迫使解释器聚焦于分类器认为对预测相关的输入数据部分。我们通过定量和定性研究评估了该方法,发现与文献中其他几种解释方法相比,PIQ生成的解释更易于我们的用户研究参与者理解。