While deep learning techniques have provided the state-of-the-art performance in various clinical tasks, explainability regarding their decision-making process can greatly enhance the credence of these methods for safer and quicker clinical adoption. With high flexibility, Gradient-weighted Class Activation Mapping (Grad-CAM) has been widely adopted to offer intuitive visual interpretation of various deep learning models' reasoning processes in computer-assisted diagnosis. However, despite the popularity of the technique, there is still a lack of systematic study on Grad-CAM's performance on different deep learning architectures. In this study, we investigate its robustness and effectiveness across different popular deep learning models, with a focus on the impact of the networks' depths and architecture types, by using a case study of automatic pneumothorax diagnosis in X-ray scans. Our results show that deeper neural networks do not necessarily contribute to a strong improvement of pneumothorax diagnosis accuracy, and the effectiveness of GradCAM also varies among different network architectures.
翻译:尽管深度学习技术已在各类临床任务中展现出最先进的性能,但其决策过程的可解释性可极大增强这些方法的可信度,从而推动其更安全、更快速地应用于临床。基于梯度加权类激活映射(Grad-CAM)具有高度灵活性,已被广泛用于计算机辅助诊断中,为各类深度学习模型的推理过程提供直观的视觉解释。然而,尽管该技术应用广泛,针对Grad-CAM在不同深度学习架构上的表现仍缺乏系统性研究。本研究以X光扫描中气胸自动诊断为案例,探讨Grad-CAM在不同主流深度学习模型中的鲁棒性与有效性,重点关注网络深度和架构类型的影响。结果表明,更深层的神经网络并不必然显著提升气胸诊断的准确性,且Grad-CAM的有效性在不同网络架构间也存在差异。