High-dimensional structural MRI (sMRI) images are widely used for Alzheimer's Disease (AD) diagnosis. Most existing methods for sMRI representation learning rely on 3D architectures (e.g., 3D CNNs), slice-wise feature extraction with late aggregation, or apply training-free feature extractions using 2D foundation models (e.g., DINO). However, these three paradigms suffer from high computational cost, loss of cross-slice relations, and limited ability to extract discriminative features, respectively. To address these challenges, we propose Multimodal Visual Surrogate Compression (MVSC). It learns to compress and adapt large 3D sMRI volumes into compact 2D features, termed as visual surrogates, which are better aligned with frozen 2D foundation models to extract powerful representations for final AD classification. MVSC has two key components: a Volume Context Encoder that captures global cross-slice context under textual guidance, and an Adaptive Slice Fusion module that aggregates slice-level information in a text-enhanced, patch-wise manner. Extensive experiments on three large-scale Alzheimer's disease benchmarks demonstrate our MVSC performs favourably on both binary and multi-class classification tasks compared against state-of-the-art methods.
翻译:高维结构磁共振成像(sMRI)图像被广泛用于阿尔茨海默病(AD)诊断。现有的大多数sMRI表征学习方法依赖于三维架构(如3D CNN)、采用后期聚合的切片级特征提取,或使用二维基础模型(如DINO)进行免训练特征提取。然而,这三种范式分别存在计算成本高、跨切片关系丢失以及提取判别性特征能力有限的问题。为应对这些挑战,我们提出了多模态视觉替代压缩(MVSC)方法。该方法学习将大型三维sMRI体积压缩并适配为紧凑的二维特征(称为视觉替代),这些特征能更好地与冻结的二维基础模型对齐,从而为最终的AD分类提取强表征。MVSC包含两个关键组件:在文本引导下捕获全局跨切片上下文的体积上下文编码器,以及以文本增强的块状方式聚合切片级信息的自适应切片融合模块。在三个大规模阿尔茨海默病基准数据集上的大量实验表明,与最先进方法相比,我们的MVSC在二分类和多分类任务上均表现出优越性能。