Predicting Hospitalization from a Whole-Person Health Score with Incomplete Electronic Health Records Data: A Case Study

Embedding a standardized whole-person health measure in electronic health records (EHR) could be instrumental to preventative care. The allostatic load index (ALI), calculated from ten component stressors across three body systems, offers a promising snapshot of holistic health. The ALI can be calculated from EHR data, but many components are missing, since not all patients undergo all tests. Using statistical modeling and machine learning, EHR data for $1000$ patients from a large academic health system were used to predict in-patient hospitalization (as a count or binary) from ALI, controlling for age and sex. Various methods were evaluated to fill in information gaps for patients' missing ALI components, including summary measures combining components or using them separately. Performance was measured using receiver operating characteristic (ROC) curves and corresponding areas under the ROC curve (AUC). Count modeling of hospitalization did not improve upon binary, and logistic regression beat random forest. Overall, summary measures performed similarly, with the complete-case proportion (i.e., the proportion of non-missing components that were "unhealthy") performing best (AUC $= 0.64$) but by $\leq 0.01$. When using components separately, the pattern submodel approach most accurately predicted hospitalization (AUC $= 0.73$) in sample, but did not cross-validate as well (AUC $= 0.63$). All summary measures performed similarly. However, when including the ALI components separately, tailoring models to subsets of patients with the same missing data pattern performed best. Next steps include EHR implementation to enable prediction and support clinician decision-making at scale.

翻译：将标准化的全人健康指标嵌入电子健康记录（EHR）可能对预防性医疗具有重要意义。基于身体三大系统的十个压力源成分计算的全负荷指数（ALI），为整体健康提供了有前景的快速评估。ALI可从EHR数据中计算得出，但由于并非所有患者都接受全部检查，许多成分缺失。通过统计建模和机器学习，本研究利用某大型学术医疗系统中1000名患者的EHR数据，在控制年龄和性别的情况下，基于ALI预测住院事件（计数或二分类）。研究评估了多种填补患者缺失ALI成分信息的方法，包括汇总指标（合并各成分或单独使用）。通过受试者工作特征（ROC）曲线及其曲线下面积（AUC）衡量模型性能。计数模型对住院事件的预测未优于二分类模型，逻辑回归表现优于随机森林。总体而言，各类汇总指标性能相近，其中完全病例比例（即"不健康"的非缺失成分占比）表现最佳（AUC=0.64），但优势不超过0.01。当单独使用各成分时，模式子模型方法在样本中对住院预测最准确（AUC=0.73），但交叉验证效果稍逊（AUC=0.63）。所有汇总指标性能相当。然而，当将ALI成分单独纳入时，为具有相同缺失数据模式的患者子集定制模型表现最优。下一步计划将模型部署至EHR系统，以实现大规模预测并支持临床决策。

相关内容

健康

关注 27

健康是指一个人在身体、精神和社会等方面都处于良好的状态。健康包括两个方面的内容：

一是主要脏器无疾病，身体形态发育良好，体形均匀，人体各系统具有良好的生理功能，有较强的身体活动能力和劳动能力，这是对健康最基本的要求；

二是对疾病的抵抗能力较强，能够适应环境变化，各种生理刺激以及致病因素对身体的作用。传统的健康观是“无病即健康”，现代人的健康观是整体健康，世界卫生组织提出“健康不仅是躯体没有疾病，还要具备心理健康、社会适应良好和有道德”。因此，现代人的健康内容包括：躯体健康、心理健康、心灵健康、社会健康、智力健康、道德健康、环境健康等。健康是人的基本权利。健康是人生的第一财富。

利用表示学习推动多机构电子健康记录数据研究

专知会员服务

16+阅读 · 2025年2月17日

【牛津大学博士论文】面向电子健康记录的深度学习:风险预测、可解释性和不确定性，200页pdf

专知会员服务

46+阅读 · 2023年7月18日

「中文电子病历命名实体识别」的研究与进展

专知会员服务

30+阅读 · 2022年11月5日

【巴黎理工博士论文】《面向不规则医疗时间戳数据的基于深度学习的多模态优化方法》2022最新148页博士论文

专知会员服务

35+阅读 · 2022年8月15日