Evaluating Post-hoc Interpretability with Intrinsic Interpretability

Despite Convolutional Neural Networks having reached human-level performance in some medical tasks, their clinical use has been hindered by their lack of interpretability. Two major interpretability strategies have been proposed to tackle this problem: post-hoc methods and intrinsic methods. Although there are several post-hoc methods to interpret DL models, there is significant variation between the explanations provided by each method, and it a difficult to validate them due to the lack of ground-truth. To address this challenge, we adapted the intrinsical interpretable ProtoPNet for the context of histopathology imaging and compared the attribution maps produced by it and the saliency maps made by post-hoc methods. To evaluate the similarity between saliency map methods and attribution maps we adapted 10 saliency metrics from the saliency model literature, and used the breast cancer metastases detection dataset PatchCamelyon with 327,680 patches of histopathological images of sentinel lymph node sections to validate the proposed approach. Overall, SmoothGrad and Occlusion were found to have a statistically bigger overlap with ProtoPNet while Deconvolution and Lime have been found to have the least.

翻译：尽管卷积神经网络在某些医学任务中已达到人类水平的表现，但其在临床中的应用因缺乏可解释性而受到阻碍。为解决这一问题，学界提出了两类主要可解释性策略：事后方法与内在方法。虽然存在多种用于解释深度学习模型的事后方法，但每种方法提供的解释之间存在显著差异，且由于缺乏真实标注，这些方法难以验证。为应对这一挑战，我们将具有内在可解释性的ProtoPNet适配于组织病理学影像场景，并比较其生成的归因图与事后方法生成的显著性图。为评估显著性图方法与归因图之间的相似性，我们从显著性模型文献中选取了10种显著性度量指标，并利用包含327,680张前哨淋巴结切片组织病理图像补丁的乳腺癌转移检测数据集PatchCamelyon来验证所提方法。整体而言，SmoothGrad与Occlusion与ProtoPNet的重叠在统计上更显著，而Deconvolution与Lime的重叠最小。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/