Interpretability is highly desired for deep neural network-based classifiers, especially when addressing high-stake decisions in medical imaging. Commonly used post-hoc interpretability methods have the limitation that they can produce plausible but different interpretations of a given model, leading to ambiguity about which one to choose. To address this problem, a novel decision-theory-motivated approach is investigated to establish a self-interpretable model, given a pretrained deep binary black-box medical image classifier. This approach involves utilizing a self-interpretable encoder-decoder model in conjunction with a single-layer fully connected network with unity weights. The model is trained to estimate the test statistic of the given trained black-box deep binary classifier to maintain a similar accuracy. The decoder output image, referred to as an equivalency map, is an image that represents a transformed version of the to-be-classified image that, when processed by the fixed fully connected layer, produces the same test statistic value as the original classifier. The equivalency map provides a visualization of the transformed image features that directly contribute to the test statistic value and, moreover, permits quantification of their relative contributions. Unlike the traditional post-hoc interpretability methods, the proposed method is self-interpretable, quantitative, and fundamentally based on decision theory. Detailed quantitative and qualitative analysis have been performed with three different medical image binary classification tasks.
翻译:可解释性对基于深度神经网络的分类器至关重要,尤其是在处理医学影像中的高风险决策时。常用的后验可解释性方法存在局限性,它们可能对同一模型生成看似合理但不同的解释,导致选择何种解释存在歧义。为解决这一问题,本文提出一种基于决策理论的创新方法,在给定预训练的深度二分类黑箱医学图像分类器的前提下,构建一个具有自解释能力的模型。该方法采用自解释的编码器-解码器模型,并辅以单层全连接网络(权重为统一常数)。该模型通过训练来估计给定黑箱二分类器的检验统计量,以保持相近的准确率。解码器输出的图像称为等价映射图,它代表待分类图像的一种变换版本;当该图像经固定全连接层处理后,能够产生与原分类器相同的检验统计量值。等价映射图直观展示了直接贡献于检验统计量的变换后图像特征,并允许对其相对贡献进行量化。与传统后验可解释性方法不同,所提方法具有自解释性、量化特性,且从根本上以决策理论为基础。我们在三项不同的医学图像二分类任务上进行了详细的定量与定性分析。