Electronic health records (EHRs) are multimodal by nature, consisting of structured tabular features like lab tests and unstructured clinical notes. In real-life clinical practice, doctors use complementary multimodal EHR data sources to get a clearer picture of patients' health and support clinical decision-making. However, most EHR predictive models do not reflect these procedures, as they either focus on a single modality or overlook the inter-modality interactions/redundancy. In this work, we propose MEDFuse, a Multimodal EHR Data Fusion framework that incorporates masked lab-test modeling and large language models (LLMs) to effectively integrate structured and unstructured medical data. MEDFuse leverages multimodal embeddings extracted from two sources: LLMs fine-tuned on free clinical text and masked tabular transformers trained on structured lab test results. We design a disentangled transformer module, optimized by a mutual information loss to 1) decouple modality-specific and modality-shared information and 2) extract useful joint representation from the noise and redundancy present in clinical notes. Through comprehensive validation on the public MIMIC-III dataset and the in-house FEMH dataset, MEDFuse demonstrates great potential in advancing clinical predictions, achieving over 90% F1 score in the 10-disease multi-label classification task.
翻译:电子健康记录(EHR)本质上是多模态的,包含诸如实验室检验结果等结构化表格特征以及非结构化的临床笔记。在实际临床实践中,医生利用互补的多模态EHR数据源来更清晰地了解患者健康状况并支持临床决策。然而,大多数EHR预测模型并未反映这一过程,它们要么仅关注单一模态,要么忽视了模态间的交互作用或冗余信息。在本研究中,我们提出了MEDFuse,一个多模态EHR数据融合框架,该框架结合了掩码实验室检验建模与大型语言模型(LLMs),以有效整合结构化和非结构化医疗数据。MEDFuse利用从两个来源提取的多模态嵌入:基于自由临床文本微调的LLMs,以及在结构化实验室检验结果上训练的掩码表格Transformer。我们设计了一个解耦Transformer模块,通过互信息损失进行优化,以实现:1)解耦模态特定信息与模态共享信息;2)从临床笔记存在的噪声和冗余中提取有用的联合表征。通过在公开的MIMIC-III数据集和内部FEMH数据集上的全面验证,MEDFuse在推进临床预测方面展现出巨大潜力,在10种疾病的多标签分类任务中取得了超过90%的F1分数。