Prediction of physiologic states are important in medical practice because interventions are guided by predicted impacts of interventions. But prediction is difficult in medicine because the generating system is complex and difficult to understand from data alone, and the data are sparse relative to the complexity of the generating processes due to human costs of data collection. Computational machinery can potentially make prediction more accurate, but, working within the constraints of realistic clinical data makes robust inference difficult because the data are sparse, noisy and nonstationary. This paper focuses on prediction given sparse, non-stationary, electronic health record data in the intensive care unit (ICU) using data assimilation, a broad collection of methods that pairs mechanistic models with inference machinery such as the Kalman filter. We find that to make inference with sparse clinical data accurate and robust requires advancements beyond standard DA methods combined with additional machine learning methods. Specifically, we show that combining the newly developed constrained ensemble Kalman filter with machine learning methods can produce substantial gains in robustness and accuracy while minimizing the data requirements. We also identify limitations of Kalman filtering methods that lead to new problems to be overcome to make inference feasible in clinical settings using realistic clinical data.
翻译:生理状态的预测在医疗实践中具有重要意义,因为临床干预的指导往往依赖于对干预效果的预测。然而,医学预测面临诸多挑战:生理系统的复杂性使其难以仅从数据中理解,同时由于数据采集的人力成本,数据采集的密度相对于生成过程的复杂性而言较为稀疏。计算工具虽能提升预测精度,但在实际临床数据的约束下(数据稀疏、噪声大且非平稳),鲁棒推断困难重重。本文聚焦于利用数据同化方法——一种将机理模型与卡尔曼滤波器等推断工具相结合的综合方法体系——针对重症监护室(ICU)中稀疏、非平稳的电子健康记录数据进行预测。研究发现,要使基于稀疏临床数据的推断兼具准确性与鲁棒性,需在标准数据同化方法基础上进行改进,并融合额外的机器学习技术。具体而言,我们证明将新提出的约束集成卡尔曼滤波器与机器学习方法相结合,可在最小化数据需求的同时,显著提升鲁棒性与准确性。此外,我们揭示了卡尔曼滤波方法的局限性,这引出了新的待解决问题,以使利用真实临床数据进行临床推断成为可行。