The vast repositories of Electronic Health Records (EHR) and medical claims hold untapped potential for studying rare but critical events, such as suicide attempt. Conventional setups often model suicide attempt as a univariate outcome and also exclude any ``single-record'' patients with a single documented encounter due to a lack of historical information. However, patients who were diagnosed with suicide attempts at the only encounter could, to some surprise, represent a substantial proportion of all attempt cases in the data, as high as 70--80%. We innovate a hybrid and integrative learning framework to leverage concurrent outcomes as surrogates and harness the forbidden yet precious information from single-record data. Our approach employs a supervised learning component to learn the latent variables that connect primary (e.g., suicide) and surrogate outcomes (e.g., mental disorders) to historical information. It simultaneously employs an unsupervised learning component to utilize the single-record data, through the shared latent variables. As such, our approach offers a general strategy for information integration that is crucial to modeling rare conditions and events. With hospital inpatient data from Connecticut, we demonstrate that single-record data and concurrent diagnoses indeed carry valuable information, and utilizing them can substantially improve suicide risk modeling.
翻译:电子健康记录(EHR)与医疗索赔数据的庞大存储库在研究自杀企图等罕见但关键事件方面具有尚未开发的潜力。传统研究范式通常将自杀企图建模为单变量结局,并因缺乏历史信息而排除所有仅有单次就诊记录的"单次记录"患者。然而令人惊讶的是,在唯一就诊记录中被诊断为自杀企图的患者可能占数据中所有自杀企图病例的很大比例,高达70-80%。我们创新性地提出一种混合集成学习框架,利用并发结局作为替代指标,并挖掘单次记录数据中禁忌而宝贵的信息。该方法采用监督学习组件,通过学习连接主要结局(如自杀)与替代结局(如精神障碍)和历史信息的潜变量;同时通过共享潜变量,运用无监督学习组件整合单次记录数据。由此,我们的方法提供了一种对罕见病症与事件建模至关重要的通用信息整合策略。基于康涅狄格州医院住院数据,我们证明单次记录数据与并发诊断确实携带重要信息,利用这些信息能显著提升自杀风险建模的效能。