Embedding a standardized whole-person health measure in electronic health records (EHR) could be instrumental to preventative care. The allostatic load index (ALI), calculated from ten component stressors across three body systems, offers a promising snapshot of holistic health. The ALI can be calculated from EHR data, but many components are missing, since not all patients undergo all tests. Using statistical modeling and machine learning, EHR data for $1000$ patients from a large academic health system were used to predict in-patient hospitalization (as a count or binary) from ALI, controlling for age and sex. Various methods were evaluated to fill in information gaps for patients' missing ALI components, including summary measures combining components or using them separately. Performance was measured using receiver operating characteristic (ROC) curves and corresponding areas under the ROC curve (AUC). Count modeling of hospitalization did not improve upon binary, and logistic regression beat random forest. Overall, summary measures performed similarly, with the complete-case proportion (i.e., the proportion of non-missing components that were "unhealthy") performing best (AUC $= 0.64$) but by $\leq 0.01$. When using components separately, the pattern submodel approach most accurately predicted hospitalization (AUC $= 0.73$) in sample, but did not cross-validate as well (AUC $= 0.63$). All summary measures performed similarly. However, when including the ALI components separately, tailoring models to subsets of patients with the same missing data pattern performed best. Next steps include EHR implementation to enable prediction and support clinician decision-making at scale.
翻译:将标准化的全人健康指标嵌入电子健康记录(EHR)可能对预防性医疗具有重要意义。基于身体三大系统的十个压力源成分计算的全负荷指数(ALI),为整体健康提供了有前景的快速评估。ALI可从EHR数据中计算得出,但由于并非所有患者都接受全部检查,许多成分缺失。通过统计建模和机器学习,本研究利用某大型学术医疗系统中1000名患者的EHR数据,在控制年龄和性别的情况下,基于ALI预测住院事件(计数或二分类)。研究评估了多种填补患者缺失ALI成分信息的方法,包括汇总指标(合并各成分或单独使用)。通过受试者工作特征(ROC)曲线及其曲线下面积(AUC)衡量模型性能。计数模型对住院事件的预测未优于二分类模型,逻辑回归表现优于随机森林。总体而言,各类汇总指标性能相近,其中完全病例比例(即"不健康"的非缺失成分占比)表现最佳(AUC=0.64),但优势不超过0.01。当单独使用各成分时,模式子模型方法在样本中对住院预测最准确(AUC=0.73),但交叉验证效果稍逊(AUC=0.63)。所有汇总指标性能相当。然而,当将ALI成分单独纳入时,为具有相同缺失数据模式的患者子集定制模型表现最优。下一步计划将模型部署至EHR系统,以实现大规模预测并支持临床决策。