PRIMA: Pre-training with Risk-integrated Image-Metadata Alignment for Medical Diagnosis via LLM

Medical diagnosis requires the effective synthesis of visual manifestations and clinical metadata. However, existing methods often treat metadata as isolated tags, failing to exploit the rich semantic knowledge embedded in clinical descriptions. We propose PRIMA (Pre-training with Risk-integrated Image-Metadata Alignment), a framework that integrates domain-specific knowledge into multi-modal representation learning. We first curate an expert corpus of risk-disease correlations via Retrieval-Augmented Generation (RAG) to refine Clinical ModernBERT, embedding diagnostic priors into the text encoder. To bridge the modality gap, we introduce a dual-encoder pre-training strategy utilizing DINOv3 and our refined BERT, optimized by a suite of four complementary loss functions. These losses are designed to capture multi-granular semantic alignment and handle the ambiguity of clinical correlations through soft labels. Finally, we leverage Qwen-3 to fuse these aligned features for precise disease classification. Extensive experiments demonstrate that PRIMA effectively harmonizes pixel-level features with abstract clinical expertise, significantly outperforming other state-of-the-art methods. Notably, our framework achieves superior robustness without the need for massive data collection or exhaustive computational resources. Our code will be made public upon acceptance.

翻译：医学诊断需要有效整合视觉表现与临床元数据。然而，现有方法通常将元数据视为孤立标签，未能充分利用临床描述中蕴含的丰富语义知识。我们提出PRIMA（基于风险整合图像-元数据对齐的预训练框架），该框架将领域特定知识融入多模态表征学习。我们首先通过检索增强生成（RAG）技术构建风险-疾病关联的专家知识库，用以优化Clinical ModernBERT，将诊断先验知识嵌入文本编码器。为弥合模态鸿沟，我们引入基于DINOv3与优化BERT的双编码器预训练策略，通过四组互补损失函数进行优化。这些损失函数旨在捕捉多粒度语义对齐，并利用软标签处理临床关联的模糊性。最后，我们借助Qwen-3融合对齐特征以实现精准疾病分类。大量实验表明，PRIMA能有效协调像素级特征与抽象临床知识，显著优于现有先进方法。值得注意的是，该框架在不依赖海量数据收集或巨额计算资源的情况下实现了卓越的鲁棒性。代码将在论文录用后公开。

相关内容

元数据

关注 7

元数据（Metadata），又称元数据、中介数据、中继数据[来源请求]，为描述数据的数据（data about data），主要是描述数据属性（property）的信息，用来支持如指示存储位置、历史数据、资源查找、文件纪录等功能。元数据算是一种电子式目录，为了达到编制目录的目的，必须在描述并收藏数据的内容或特色，进而达成协助数据检索的目的。

用于三维医学影像理解的综合语言–图像预训练

专知会员服务

7+阅读 · 2025年11月5日

【博士论文】结合图像与文本以提升医学图像理解

专知会员服务

30+阅读 · 2025年3月1日

ICML 2024 | Med-ST：解锁时空信息在医学多模态预训练中的能力

专知会员服务

13+阅读 · 2024年7月10日

LLM in Medical Domain: 大语言模型在医学领域的应用

专知会员服务

103+阅读 · 2023年6月17日