Masked autoencoder (MAE) has emerged as a promising self-supervised pretraining technique to enhance the representation learning of a neural network without human intervention. To adapt MAE onto volumetric medical images, existing methods exhibit two challenges: first, the global information crucial for understanding the clinical context of the holistic data is lacked; second, there was no guarantee of stabilizing the representations learned from the randomly masked inputs. To tackle these limitations, we proposed Global-Local Masked AutoEncoder (GL-MAE), a simple yet effective self-supervised pre-training strategy. GL-MAE reconstructs both the masked global and masked local volumes, which enables learning the essential local details as well as the global context. We further introduced global-to-global consistency and local-to-global correspondence via global-guided consistency learning to enhance and stabilize the representation learning of the masked volumes. Finetuning results on multiple datasets illustrate the superiority of our method over other state-of-the-art self-supervised algorithms, demonstrating its effectiveness on versatile volumetric medical image segmentation tasks, even when annotations are scarce. Codes and models will be released upon acceptance.
翻译:掩码自编码器(MAE)作为一种无需人工干预的自监督预训练技术,在提升神经网络表示学习能力方面展现出广阔前景。针对体素医学图像的适配需求,现有方法存在两大挑战:首先,缺乏理解全局数据临床背景所需的关键全局信息;其次,无法保证从随机掩码输入中学习到的表征稳定性。为解决上述局限,我们提出全局-局部掩码自编码器(GL-MAE)——一种简洁高效的自监督预训练策略。GL-MAE同时重建全局与局部掩码体素,从而在保留局部细节特征的同时获取全局上下文信息。我们进一步通过全局引导的一致性学习,引入全局到全局一致性约束与局部到全局对应关系,以增强并稳定掩码体素的表征学习。多数据集微调结果表明,本方法在各类体素医学图像分割任务中均优于现有最先进的自监督算法,即使面对标注稀疏场景仍能保持卓越性能。相关代码与模型将在论文接收后公开发布。