The interpretability of deep neural networks has become a subject of great interest within the medical and healthcare domain. This attention stems from concerns regarding transparency, legal and ethical considerations, and the medical significance of predictions generated by these deep neural networks in clinical decision support systems. To address this matter, our study delves into the application of four well-established interpretability methods: Local Interpretable Model-agnostic Explanations (LIME), Shapley Additive exPlanations (SHAP), Gradient-weighted Class Activation Mapping (Grad-CAM), and Layer-wise Relevance Propagation (LRP). Leveraging the approach of transfer learning with a multi-label-multi-class chest radiography dataset, we aim to interpret predictions pertaining to specific pathology classes. Our analysis encompasses both single-label and multi-label predictions, providing a comprehensive and unbiased assessment through quantitative and qualitative investigations, which are compared against human expert annotation. Notably, Grad-CAM demonstrates the most favorable performance in quantitative evaluation, while the LIME heatmap score segmentation visualization exhibits the highest level of medical significance. Our research underscores both the outcomes and the challenges faced in the holistic approach adopted for assessing these interpretability methods and suggests that a multimodal-based approach, incorporating diverse sources of information beyond chest radiography images, could offer additional insights for enhancing interpretability in the medical domain.
翻译:摘要:深度神经网络的可解释性已成为医疗健康领域备受关注的研究课题。这一关注源于临床决策支持系统中深度神经网络预测结果的透明度、法律与伦理问题,以及其临床意义方面的考量。为应对此挑战,本研究深入探讨了四种成熟的可解释性方法:局部可解释模型无关解释(LIME)、沙普利加法解释(SHAP)、梯度加权类激活映射(Grad-CAM)和逐层相关性传播(LRP)。我们采用迁移学习方法,结合多标签多类别胸部X光片数据集,旨在解释针对特定病理类别的预测结果。分析涵盖单标签与多标签预测,通过定量与定性研究提供全面无偏的评估,并与人类专家标注结果进行对比。值得注意的是,Grad-CAM在定量评估中表现最优,而LIME热图分数分割可视化则展现出最高的临床意义。本研究既揭示了评估这些可解释性方法所采用整体方案的成果与挑战,也表明:融入胸部X光片影像以外多元化信息源的多模态方法,或可为提升医学领域可解释性提供新的见解。