Magnetic resonance imaging (MRI) is essential for nasopharyngeal carcinoma (NPC) radiotherapy (RT), but practical constraints, such as patient discomfort, long scan times, and high costs often lead to incomplete modalities in clinical practice, compromising RT planning accuracy. Traditional MRI synthesis methods are modality-specific, limited in anatomical adaptability, and lack clinical interpretability-failing to meet NPC's RT needs. Here, we developed a unified foundation model integrating contrastive visual representation learning and vision-language alignment (VLA) to enable any-to-all MRI synthesis. The model uses a contrastive encoder for modality-invariant representations and a CLIP-based text-informed decoder for semantically consistent synthesis, supporting any-to-all MRI synthesis via one unified foundation model. Trained on 40,825 images from 13 institutions, it achieves consistently high performance (average SSIM 0.90, PSNR 27) across 26 internal/external validation sites (15,748 images), with superior synthesis fidelity and robustness to noise and domain shifts. Meanwhile, its unified representation enhances downstream RT-relevant tasks (e.g., segmentation). This work advances digital medicine solutions for NPC care by leveraging foundation models to bridge technical synthesis and clinical utility.
翻译:磁共振成像(MRI)是鼻咽癌(NPC)放射治疗(RT)的关键技术,但实际限制(如患者不适、扫描时间长、成本高)常导致临床实践中模态不全,影响RT计划精度。传统MRI合成方法局限于特定模态,解剖适应性不足,且缺乏临床可解释性,难以满足NPC的RT需求。本研究开发了一个统一基础模型,整合对比视觉表征学习与视觉-语言对齐(VLA),实现任意到全模态MRI合成。该模型采用对比编码器提取模态不变表征,并基于CLIP的文本感知解码器进行语义一致性合成,通过单一统一基础模型支持任意到全模态MRI合成。基于13家机构的40,825张图像训练,模型在26个内外部验证站点(15,748张图像)上均取得优异性能(平均SSIM 0.90、PSNR 27),展现出卓越的合成保真度以及对噪声和域偏移的鲁棒性。同时,其统一表征能力提升了RT相关下游任务(如分割)的性能。本研究通过基础模型架起技术合成与临床应用的桥梁,推动了NPC诊疗数字化解决方案的发展。