Electronic Health Records (EHR) are generated from clinical routine care recording valuable information of broad patient populations, which provide plentiful opportunities for improving patient management and intervention strategies in clinical practice. To exploit the enormous potential of EHR data, a popular EHR data analysis paradigm in machine learning is EHR representation learning, which first leverages the individual patient's EHR data to learn informative representations by a backbone, and supports diverse health-care downstream tasks grounded on the representations. Unfortunately, such a paradigm fails to access the in-depth analysis of patients' relevance, which is generally known as cohort studies in clinical practice. Specifically, patients in the same cohort tend to share similar characteristics, implying their resemblance in medical conditions such as symptoms or diseases. In this paper, we propose a universal COhort Representation lEarning (CORE) framework to augment EHR utilization by leveraging the fine-grained cohort information among patients. In particular, CORE first develops an explicit patient modeling task based on the prior knowledge of patients' diagnosis codes, which measures the latent relevance among patients to adaptively divide the cohorts for each patient. Based on the constructed cohorts, CORE recodes the pre-extracted EHR data representation from intra- and inter-cohort perspectives, yielding augmented EHR data representation learning. CORE is readily applicable to diverse backbone models, serving as a universal plug-in framework to infuse cohort information into healthcare methods for boosted performance. We conduct an extensive experimental evaluation on two real-world datasets, and the experimental results demonstrate the effectiveness and generalizability of CORE.
翻译:电子健康记录(EHR)源自临床常规诊疗记录,承载了广泛患者群体的宝贵信息,为改进临床实践中患者管理与干预策略提供了丰富机遇。为充分挖掘EHR数据的巨大潜力,机器学习领域一种流行的EHR数据分析范式是EHR表示学习,即首先利用个体患者的EHR数据,通过骨干网络学习信息丰富的表示,并基于这些表示为多样化的医疗下游任务提供支撑。然而,该范式未能深入分析患者间的关联性,这在临床实践中通常被称为队列研究。具体而言,属于同一队列的患者往往具有相似特征,这意味着他们在症状或疾病等医疗状况上存在相似性。本文提出一种通用的队列表示学习(CORE)框架,通过利用患者间细粒度的队列信息来增强EHR数据的利用效率。具体而言,CORE首先基于患者诊断编码的先验知识,设计了一个显式的患者建模任务,用于衡量患者间的潜在关联性,从而为每位患者自适应划分队列。基于构建的队列,CORE从队列内和队列间两个视角重新编码预提取的EHR数据表示,实现增强的EHR数据表示学习。CORE易于适用于各类骨干网络,作为一种通用插件式框架,可将队列信息注入医疗方法中以提升性能。我们在两个真实世界数据集上进行了广泛的实验评估,实验结果表明了CORE的有效性与泛化能力。