Interpretable deep learning models have received widespread attention in the field of image recognition. Due to the unique multi-instance learning of medical images and the difficulty in identifying decision-making regions, many interpretability models that have been proposed still have problems of insufficient accuracy and interpretability in medical image disease diagnosis. To solve these problems, we propose feature-driven inference network (FeaInfNet). Our first key innovation involves proposing a feature-based network reasoning structure, which is applied to FeaInfNet. The network of this structure compares the similarity of each sub-region image patch with the disease templates and normal templates that may appear in the region, and finally combines the comparison of each sub-region to make the final diagnosis. It simulates the diagnosis process of doctors to make the model interpretable in the reasoning process, while avoiding the misleading caused by the participation of normal areas in reasoning. Secondly, we propose local feature masks (LFM) to extract feature vectors in order to provide global information for these vectors, thus enhancing the expressive ability of the FeaInfNet. Finally, we propose adaptive dynamic masks (Adaptive-DM) to interpret feature vectors and prototypes into human-understandable image patches to provide accurate visual interpretation. We conducted qualitative and quantitative experiments on multiple publicly available medical datasets, including RSNA, iChallenge-PM, Covid-19, ChinaCXRSet, and MontgomerySet. The results of our experiments validate that our method achieves state-of-the-art performance in terms of classification accuracy and interpretability compared to baseline methods in medical image diagnosis. Additional ablation studies verify the effectiveness of each of our proposed components.
翻译:可解释深度学习模型在图像识别领域受到广泛关注。由于医学图像独特的多元实例学习特性以及决策区域难以识别的问题,现有诸多可解释性模型在医学图像疾病诊断中仍存在准确率不足与可解释性欠佳的问题。为解决上述难题,我们提出特征驱动推理网络(FeaInfNet)。我们的首个关键创新在于提出基于特征的网络推理结构,并将其应用于FeaInfNet。该结构网络通过比对每个子区域图像块与该区域可能出现的疾病模板和正常模板的相似性,最终综合各子区域比对结果作出诊断。该结构模拟医生的诊断过程,使模型在推理过程中具有可解释性,同时避免正常区域参与推理造成的误导。其次,我们提出局部特征掩码(LFM)提取特征向量,并为这些向量提供全局信息,从而增强FeaInfNet的表达能力。最后,我们提出自适应动态掩码(Adaptive-DM),将特征向量与原型解释为人可理解的图像块,以提供精确的视觉解释。我们在RSNA、iChallenge-PM、Covid-19、ChinaCXRSet和MontgomerySet等多个公开医学数据集上开展了定性与定量实验。实验结果验证,相较于基线方法,本方法在医学图像诊断的准确性及可解释性方面均达到最优性能。此外,消融实验进一步证实了我们各提出组件的有效性。