Are Deep Learning Classification Results Obtained on CT Scans Fair and Interpretable?

Following the great success of various deep learning methods in image and object classification, the biomedical image processing society is also overwhelmed with their applications to various automatic diagnosis cases. Unfortunately, most of the deep learning-based classification attempts in the literature solely focus on the aim of extreme accuracy scores, without considering interpretability, or patient-wise separation of training and test data. For example, most lung nodule classification papers using deep learning randomly shuffle data and split it into training, validation, and test sets, causing certain images from the CT scan of a person to be in the training set, while other images of the exact same person to be in the validation or testing image sets. This can result in reporting misleading accuracy rates and the learning of irrelevant features, ultimately reducing the real-life usability of these models. When the deep neural networks trained on the traditional, unfair data shuffling method are challenged with new patient images, it is observed that the trained models perform poorly. In contrast, deep neural networks trained with strict patient-level separation maintain their accuracy rates even when new patient images are tested. Heat-map visualizations of the activations of the deep neural networks trained with strict patient-level separation indicate a higher degree of focus on the relevant nodules. We argue that the research question posed in the title has a positive answer only if the deep neural networks are trained with images of patients that are strictly isolated from the validation and testing patient sets.

翻译：继深度学习在图像与目标分类领域取得巨大成功后，生物医学图像处理领域也涌现出大量基于深度学习的自动诊断应用。然而，现有文献中大多数基于深度学习的分类尝试仅追求极端准确率，而忽略了可解释性以及训练集与测试集的患者级分离。例如，多数采用深度学习的肺结节分类论文随机打乱数据并划分训练集、验证集与测试集，导致同一患者的CT扫描部分图片进入训练集，而同一患者的其他图片却进入验证集或测试集。这种做法可能产生误导性的准确率报告，并使模型学习到无关特征，最终降低其实际应用价值。当采用传统不公平数据划分方法训练的深度神经网络面对新患者图像时，其表现显著下降；而采用严格患者级分离训练的深度神经网络即使在新患者图像测试中仍能保持原有准确率。热力图激活可视化显示，经过严格患者级分离训练的深度神经网络对相关结节区域的关注度更高。我们认为，仅当深度神经网络使用严格与验证集、测试集患者隔离的图像进行训练时，标题提出的问题才具有肯定答案。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日