Alzheimer's disease (AD) is an incurable neurodegenerative condition leading to cognitive and functional deterioration. Given the lack of a cure, prompt and precise AD diagnosis is vital, a complex process dependent on multiple factors and multi-modal data. While successful efforts have been made to integrate multi-modal representation learning into medical datasets, scant attention has been given to 3D medical images. In this paper, we propose Contrastive Masked Vim Autoencoder (CMViM), the first efficient representation learning method tailored for 3D multi-modal data. Our proposed framework is built on a masked Vim autoencoder to learn a unified multi-modal representation and long-dependencies contained in 3D medical images. We also introduce an intra-modal contrastive learning module to enhance the capability of the multi-modal Vim encoder for modeling the discriminative features in the same modality, and an inter-modal contrastive learning module to alleviate misaligned representation among modalities. Our framework consists of two main steps: 1) incorporate the Vision Mamba (Vim) into the mask autoencoder to reconstruct 3D masked multi-modal data efficiently. 2) align the multi-modal representations with contrastive learning mechanisms from both intra-modal and inter-modal aspects. Our framework is pre-trained and validated ADNI2 dataset and validated on the downstream task for AD classification. The proposed CMViM yields 2.7\% AUC performance improvement compared with other state-of-the-art methods.
翻译:阿尔茨海默病(AD)是一种不可治愈的神经退行性疾病,会导致认知和功能衰退。由于缺乏根治方法,及时准确的AD诊断至关重要,而这一过程依赖于多种因素和多模态数据。尽管已成功将多模态表示学习整合到医学数据集中的相关工作,但针对3D医学图像的研究仍十分有限。本文提出对比掩码Vim自编码器(CMViM),这是首个专为3D多模态数据设计的高效表示学习方法。该框架基于掩码Vim自编码器构建,用于学习统一的多模态表示以及3D医学图像中包含的长程依赖性。我们还引入了模态内对比学习模块,以增强多模态Vim编码器对同一模态中判别性特征的建模能力,并引入模态间对比学习模块,以缓解不同模态间表示的对齐偏差。本框架包含两个主要步骤:1)将Vision Mamba(Vim)整合到掩码自编码器中,高效重建3D掩码多模态数据;2)通过模态内和模态间的对比学习机制对齐多模态表示。该框架在ADNI2数据集上进行预训练和验证,并应用于AD分类下游任务。与现有最优方法相比,所提出的CMViM在AUC性能上提升了2.7%。