Self-Supervised Learning (SSL) presents an exciting opportunity to unlock the potential of vast, untapped clinical datasets, for various downstream applications that suffer from the scarcity of labeled data. While SSL has revolutionized fields like natural language processing and computer vision, their adoption in 3D medical image computing has been limited by three key pitfalls: Small pre-training dataset sizes, architectures inadequate for 3D medical image analysis, and insufficient evaluation practices. We address these issues by i) leveraging a large-scale dataset of 44k 3D brain MRI volumes and ii) using a Residual Encoder U-Net architecture within the state-of-the-art nnU-Net framework. iii) A robust development framework, incorporating 5 development and 8 testing brain MRI segmentation datasets, allowed performance-driven design decisions to optimize the simple concept of Masked Auto Encoders (MAEs) for 3D CNNs. The resulting model not only surpasses previous SSL methods but also outperforms the strong nnU-Net baseline by an average of approximately 3 Dice points. Furthermore, our model demonstrates exceptional stability, achieving the highest average rank of 2 out of 7 methods, compared to the second-best method's mean rank of 3.
翻译:自监督学习(SSL)为释放海量未标注临床数据集的潜力提供了令人振奋的机遇,尤其适用于那些受标注数据稀缺制约的下游应用。尽管SSL已在自然语言处理和计算机视觉领域引发革命性变革,但其在三维医学图像计算中的应用仍受限于三个关键瓶颈:预训练数据集规模有限、架构不适用于三维医学图像分析,以及评估方法不够完善。本研究通过以下方式解决这些问题:i)利用包含4.4万个三维脑部MRI体积的大规模数据集;ii)在先进的nnU-Net框架中采用残差编码器U-Net架构;iii)构建包含5个开发数据集和8个测试数据集的三维脑部MRI分割稳健开发框架,通过性能驱动的设计决策优化面向三维卷积网络的掩码自编码器(MAE)基础方案。所得模型不仅超越了现有SSL方法,其平均Dice系数较强大的nnU-Net基线模型提升约3个百分点。此外,该模型展现出卓越的稳定性,在七种对比方法中获得最高平均排名(第2位),而次优方法的平均排名为第3位。