Self-Supervised Learning (SSL) presents an exciting opportunity to unlock the potential of vast, untapped clinical datasets, for various downstream applications that suffer from the scarcity of labeled data. While SSL has revolutionized fields like natural language processing and computer vision, its adoption in 3D medical image computing has been limited by three key pitfalls: Small pre-training dataset sizes, architectures inadequate for 3D medical image analysis, and insufficient evaluation practices. In this paper, we address these issues by i) leveraging a large-scale dataset of 39k 3D brain MRI volumes and ii) using a Residual Encoder U-Net architecture within the state-of-the-art nnU-Net framework. iii) A robust development framework, incorporating 5 development and 8 testing brain MRI segmentation datasets, allowed performance-driven design decisions to optimize the simple concept of Masked Auto Encoders (MAEs) for 3D CNNs. The resulting model not only surpasses previous SSL methods but also outperforms the strong nnU-Net baseline by an average of approximately 3 Dice points setting a new state-of-the-art. Our code and models are made available here.
翻译:自监督学习为利用海量未标注临床数据提供了激动人心的机遇,尤其适用于那些受限于标注数据稀缺的下游应用。尽管自监督学习已在自然语言处理和计算机视觉领域引发革命性变革,但其在三维医学图像计算中的应用仍受限于三个关键问题:预训练数据集规模有限、缺乏适用于三维医学图像分析的网络架构,以及评估体系不够完善。本文通过以下方式解决这些问题:i) 利用包含3.9万个三维脑部MRI影像的大规模数据集;ii) 在先进的nnU-Net框架中采用残差编码器U-Net架构;iii) 建立包含5个开发数据集和8个测试数据集的脑部MRI分割稳健开发框架,通过性能驱动的设计决策优化适用于三维卷积神经网络的掩码自编码器基础方案。所得模型不仅超越了现有自监督学习方法,更以平均约3个Dice系数的优势超越了强大的nnU-Net基线模型,创造了新的性能标杆。我们的代码与模型已在此公开。