Are Deep Learning Classification Results Obtained on CT Scans Fair and Interpretable?

Following the great success of various deep learning methods in image and object classification, the biomedical image processing society is also overwhelmed with their applications to various automatic diagnosis cases. Unfortunately, most of the deep learning-based classification attempts in the literature solely focus on the aim of extreme accuracy scores, without considering interpretability, or patient-wise separation of training and test data. For example, most lung nodule classification papers using deep learning randomly shuffle data and split it into training, validation, and test sets, causing certain images from the CT scan of a person to be in the training set, while other images of the exact same person to be in the validation or testing image sets. This can result in reporting misleading accuracy rates and the learning of irrelevant features, ultimately reducing the real-life usability of these models. When the deep neural networks trained on the traditional, unfair data shuffling method are challenged with new patient images, it is observed that the trained models perform poorly. In contrast, deep neural networks trained with strict patient-level separation maintain their accuracy rates even when new patient images are tested. Heat-map visualizations of the activations of the deep neural networks trained with strict patient-level separation indicate a higher degree of focus on the relevant nodules. We argue that the research question posed in the title has a positive answer only if the deep neural networks are trained with images of patients that are strictly isolated from the validation and testing patient sets.

翻译：继各类深度学习方法在图像与物体分类领域取得巨大成功后，生物医学图像处理领域亦涌现大量将其应用于自动诊断的研究。遗憾的是，现有文献中多数基于深度学习的分类尝试仅片面追求极致准确率，既未考虑模型可解释性，也未在训练集与测试集之间实现患者级别的严格分离。例如，多数基于深度学习进行肺结节分类的论文随机打乱数据并划分为训练集、验证集和测试集，导致同一患者的某些CT影像切片进入训练集，而该患者的其他影像切片却被归入验证集或测试集。这种做法不仅可能报告误导性的准确率，还会促使模型学习无关特征，最终削弱这些模型的实际应用价值。当基于传统不公正数据混洗方法训练的深度神经网络面临新患者影像时，模型表现显著下降。相比之下，采用严格患者级分离训练的深度神经网络即使面对新患者影像测试，其准确率仍保持稳定。对严格患者级分离训练的深度神经网络激活图进行热力图可视化分析，结果显示模型对相关结节的关注度显著更高。我们认为，只有当深度神经网络的训练影像来源患者与验证集、测试集患者严格隔离时，论文标题提出的研究问题才能得出肯定答案。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日