Dealing with Small Datasets for Deep Learning in Medical Imaging: An Evaluation of Self-Supervised Pre-Training on CT Scans Comparing Contrastive and Masked Autoencoder Methods for Convolutional Models

Learning · contrastive · 自编码器 · 数据集 · 可约的 ·

2023 年 8 月 24 日

翻译：应对医学影像深度学习中数据集规模有限的问题：面向卷积模型的自监督预训练方法对比——基于CT影像的对比学习与掩码自编码器评估

Daniel Wolf,Tristan Payer,Catharina Silvia Lisson,Christoph Gerhard Lisson,Meinrad Beer,Timo Ropinski,Michael Götz

from arxiv, This paper is under review. The code will be released if accepted

Deep learning in medical imaging has the potential to minimize the risk of diagnostic errors, reduce radiologist workload, and accelerate diagnosis. Training such deep learning models requires large and accurate datasets, with annotations for all training samples. However, in the medical imaging domain, annotated datasets for specific tasks are often small due to the high complexity of annotations, limited access, or the rarity of diseases. To address this challenge, deep learning models can be pre-trained on large image datasets without annotations using methods from the field of self-supervised learning. After pre-training, small annotated datasets are sufficient to fine-tune the models for a specific task. The most popular self-supervised pre-training approaches in medical imaging are based on contrastive learning. However, recent studies in natural image processing indicate a strong potential for masked autoencoder approaches. Our work compares state-of-the-art contrastive learning methods with the recently introduced masked autoencoder approach "SparK" for convolutional neural networks (CNNs) on medical images. Therefore we pre-train on a large unannotated CT image dataset and fine-tune on several CT classification tasks. Due to the challenge of obtaining sufficient annotated training data in medical imaging, it is of particular interest to evaluate how the self-supervised pre-training methods perform when fine-tuning on small datasets. By experimenting with gradually reducing the training dataset size for fine-tuning, we find that the reduction has different effects depending on the type of pre-training chosen. The SparK pre-training method is more robust to the training dataset size than the contrastive methods. Based on our results, we propose the SparK pre-training for medical imaging tasks with only small annotated datasets.

翻译：医学影像中的深度学习有潜力降低诊断错误风险、减轻放射科医生工作负担并加速诊断进程。训练此类深度学习模型需要大规模且标注精准的数据集，但医学影像领域特定任务的标注数据集往往因标注复杂度高、获取途径受限或疾病罕见而规模较小。为解决这一挑战，可采用自监督学习方法在无标注的大规模图像数据集上预训练深度学习模型。预训练后，仅需少量标注数据集即可针对特定任务对模型进行微调。当前医学影像中最流行的自监督预训练方法基于对比学习，然而自然图像处理领域的最新研究表明掩码自编码器方法具有巨大潜力。本研究系统比较了前沿对比学习方法与近期针对卷积神经网络提出的掩码自编码器方法"SparK"在医学图像上的表现。为此，我们在大规模无标注CT图像数据集上进行预训练，并在多个CT分类任务上开展微调。鉴于医学影像中获取充分标注训练数据的挑战，评估自监督预训练方法在小型数据集微调时的性能具有特殊意义。通过逐步缩减微调训练数据集的规模进行实验，我们发现数据缩减对模型的影响因预训练方法类型而异。相较于对比学习方法，SparK预训练方法对训练数据集规模表现出更强的鲁棒性。基于实验结果，我们建议将SparK预训练用于仅含少量标注数据集的医学影像任务。