Deep neural networks have achieved strong performance in medical image classification, but often work like black-box. Commonly used post-hoc interpretation methods often provide heuristic visualizations whose relationship to the classifier's predictive distribution is indirect. This work introduces a local sensitivity analysis framework based on the input-dependent Fisher Information Matrix (iFIM) of a trained classifier. The iFIM characterizes how the classifier's predictive distribution changes under infinitesimal perturbations of the input image. By using a Gram-matrix formulation, the nonzero eigenspectrum of the iFIM can be recovered without explicitly forming the full image-dimensional Fisher matrix. The leading iFIM eigenspace is then used to project an input image into a high local-sensitivity component and its orthogonal component. These components provide a model-intrinsic description of local predictive sensitivity, rather than a conventional pixel-wise attribution heatmap or a causal segmentation of task-relevant anatomy. The framework is evaluated on controlled and clinical medical image classification tasks using multiple classifier architectures. Perturbation-based experiments show that high-sensitivity iFIM components are more strongly coupled to changes in predictive confidence and classification performance than lower-sensitivity complementary components. The results support the iFIM framework as a principled tool for analyzing local decision sensitivity and for complementing existing attribution-based interpretability methods in medical imaging.
翻译:深度神经网络在医学图像分类中取得了优异性能,但其运行机制常如同“黑箱”。现有常用的后验解释方法通常提供启发式可视化结果,这些结果与分类器预测分布之间的关系是间接的。本研究提出一种基于训练好的分类器的“输入依赖 Fisher 信息矩阵”(iFIM)的局部敏感性分析框架。iFIM能够描述分类器的预测分布在输入图像受到无穷小扰动时如何变化。通过利用 Gram 矩阵公式,我们可在不显式构造完整的图像维度 Fisher 矩阵的情况下,恢复 iFIM 的非零特征谱。进而,利用 iFIM 的主要特征空间将输入图像分解为高局部敏感性分量及其正交分量。这些分量提供了对局部预测敏感性的一种模型内在描述,而非传统的逐像素归因热力图或任务相关区域的因果分割。该框架在受控及临床医学图像分类任务中,采用多种分类器架构进行了评估。基于扰动的实验表明,与低敏感性的互补分量相比,高敏感性的 iFIM 分量与预测置信度及分类性能的变化关联更强。研究结果支持 iFIM 框架作为一种分析局部决策敏感性的原理性工具,可作为医学影像中现有基于归因的解释性方法的补充。