Masked autoencoder (MAE) is a promising self-supervised pre-training technique that can improve the representation learning of a neural network without human intervention. However, applying MAE directly to volumetric medical images poses two challenges: (i) a lack of global information that is crucial for understanding the clinical context of the holistic data, (ii) no guarantee of stabilizing the representations learned from randomly masked inputs. To address these limitations, we propose the \textbf{G}lobal-\textbf{L}ocal \textbf{M}asked \textbf{A}uto\textbf{E}ncoder (GL-MAE), a simple yet effective self-supervised pre-training strategy. In addition to reconstructing masked local views, as in previous methods, GL-MAE incorporates global context learning by reconstructing masked global views. Furthermore, a complete global view is integrated as an anchor to guide the reconstruction and stabilize the learning process through global-to-global consistency learning and global-to-local consistency learning. Finetuning results on multiple datasets demonstrate the superiority of our method over other state-of-the-art self-supervised algorithms, highlighting its effectiveness on versatile volumetric medical image segmentation tasks, even when annotations are scarce. Our codes and models will be released upon acceptance.
翻译:掩码自编码器(MAE)是一种有前景的自监督预训练技术,可在无需人工干预的情况下提升神经网络的表示学习能力。然而,将MAE直接应用于体素医学图像面临两大挑战:(i)缺乏对理解整体数据临床背景至关重要的全局信息;(ii)无法确保从随机掩码输入中学习到的表示的稳定性。为应对这些限制,我们提出**全局-局部掩码自编码器(GL-MAE)**——一种简单而有效的自监督预训练策略。与以往方法仅重构掩码局部视图不同,GL-MAE通过重构掩码全局视图引入全局上下文学习。此外,完整全局视图被整合为锚点,通过全局-全局一致性学习和全局-局部一致性学习引导重构并稳定学习过程。多数据集微调结果表明,我们的方法优于其他先进的自监督算法,在体素医学图像分割任务中(即使标注样本稀缺时)仍能展现卓越效能。相关代码与模型将在接收后公开发布。