Are Natural-Domain Foundation Models Effective for Accelerated Cardiac MRI Reconstruction?

The emergence of large-scale pretrained foundation models has transformed computer vision, enabling strong performance across diverse downstream tasks. However, their potential for physics-based inverse problems, such as accelerated cardiac MRI reconstruction, remains largely underexplored. In this work, we investigate whether natural-domain foundation models can serve as effective image priors for accelerated cardiac MRI reconstruction, and compare the performance obtained against domain-specific counterparts such as BiomedCLIP. We propose an unrolled reconstruction framework that incorporates pretrained, frozen visual encoders, such as CLIP, DINOv2, and BiomedCLIP, within each cascade to guide the reconstruction process. Through extensive experiments, we show that while task-specific state-of-the-art reconstruction models such as E2E-VarNet achieve superior performance in standard in-distribution settings, foundation-model-based approaches remain competitive. More importantly, in challenging cross-domain scenarios, where models are trained on cardiac MRI and evaluated on anatomically distinct knee and brain datasets--foundation models exhibit improved robustness, particularly under high acceleration factors and limited low-frequency sampling. We further observe that natural-image-pretrained models, such as CLIP, learn highly transferable structural representations, while domain-specific pretraining (BiomedCLIP) provides modest additional gains in more ill-posed regimes. Overall, our results suggest that pretrained foundation models offer a promising source of transferable priors, enabling improved robustness and generalization in accelerated MRI reconstruction.

翻译：大规模预训练基础模型的涌现已重塑计算机视觉领域，使其在各类下游任务中展现出卓越性能。然而，这类模型在基于物理的反问题（如加速心脏磁共振重建）中的潜力尚未得到充分探索。本研究旨在探究自然域基础模型能否作为加速心脏磁共振重建的有效图像先验，并对比其与域特定模型（如BiomedCLIP）的性能表现。我们提出一种级联展开重建框架，在每一级中嵌入预训练的冻结视觉编码器（如CLIP、DINOv2及BiomedCLIP）以引导重建过程。通过广泛实验发现：虽然任务特定的最先进重建模型（如E2E-VarNet）在标准分布内场景中表现更优，但基于基础模型的方法仍具有竞争力。更关键的是，在具有挑战性的跨域场景中（模型在心脏MRI数据上训练后，需对解剖结构迥异的膝盖与脑部数据集进行测试），基础模型展现出更强的鲁棒性，尤其在高加速因子与有限低频采样条件下。进一步观察表明，自然图像预训练模型（如CLIP）习得了高度可迁移的结构表征，而域特定预训练（BiomedCLIP）仅在病态程度更高的场景中提供适度增益。总体而言，本研究表明预训练基础模型可作为具有潜力的可迁移先验来源，有效提升加速MRI重建的鲁棒性与泛化能力。