Contrastive Learning on Multimodal Analysis of Electronic Health Records

Electronic health record (EHR) systems contain a wealth of multimodal clinical data including structured data like clinical codes and unstructured data such as clinical notes. However, many existing EHR-focused studies has traditionally either concentrated on an individual modality or merged different modalities in a rather rudimentary fashion. This approach often results in the perception of structured and unstructured data as separate entities, neglecting the inherent synergy between them. Specifically, the two important modalities contain clinically relevant, inextricably linked and complementary health information. A more complete picture of a patient's medical history is captured by the joint analysis of the two modalities of data. Despite the great success of multimodal contrastive learning on vision-language, its potential remains under-explored in the realm of multimodal EHR, particularly in terms of its theoretical understanding. To accommodate the statistical analysis of multimodal EHR data, in this paper, we propose a novel multimodal feature embedding generative model and design a multimodal contrastive loss to obtain the multimodal EHR feature representation. Our theoretical analysis demonstrates the effectiveness of multimodal learning compared to single-modality learning and connects the solution of the loss function to the singular value decomposition of a pointwise mutual information matrix. This connection paves the way for a privacy-preserving algorithm tailored for multimodal EHR feature representation learning. Simulation studies show that the proposed algorithm performs well under a variety of configurations. We further validate the clinical utility of the proposed algorithm in real-world EHR data.

翻译：电子健康记录（EHR）系统包含丰富的多模态临床数据，包括临床编码等结构化数据和临床笔记等非结构化数据。然而，现有许多基于EHR的研究传统上要么集中于单一模态，要么以较为粗浅的方式合并不同模态。这种方法通常将结构化和非结构化数据视为相互独立的实体，忽略了它们之间固有的协同作用。具体而言，这两种重要模态包含临床相关、密不可分且相互补充的健康信息。对这两种数据模态进行联合分析，能够更全面地呈现患者的病史。尽管多模态对比学习在视觉-语言领域取得了巨大成功，但其在EHR多模态领域的潜力——尤其是在理论理解方面——仍未得到充分探索。为适应多模态EHR数据的统计分析，本文提出了一种新颖的多模态特征嵌入生成模型，并设计了一种多模态对比损失函数以获取多模态EHR特征表示。我们的理论分析证明了多模态学习相较于单模态学习的有效性，并将损失函数的解与逐点互信息矩阵的奇异值分解联系起来。这一联系为针对多模态EHR特征表示学习的隐私保护算法铺平了道路。模拟研究表明，所提出的算法在各种配置下均表现良好。我们进一步在实际EHR数据中验证了该算法的临床实用性。