Machine learning (ML) models are increasingly pivotal in automating clinical decisions. Yet, a glaring oversight in prior research has been the lack of proper processing of Electronic Medical Record (EMR) data in the clinical context for errors and outliers. Addressing this oversight, we introduce an innovative projections-based method that seamlessly integrates clinical expertise as domain constraints, generating important meta-data that can be used in ML workflows. In particular, by using high-dimensional mixed-integer programs that capture physiological and biological constraints on patient vitals and lab values, we can harness the power of mathematical "projections" for the EMR data to correct patient data. Consequently, we measure the distance of corrected data from the constraints defining a healthy range of patient data, resulting in a unique predictive metric we term as "trust-scores". These scores provide insight into the patient's health status and significantly boost the performance of ML classifiers in real-life clinical settings. We validate the impact of our framework in the context of early detection of sepsis using ML. We show an AUROC of 0.865 and a precision of 0.922, that surpasses conventional ML models without such projections.
翻译:机器学习模型在自动化临床决策中日益关键。然而,既往研究存在显著疏漏:临床环境中电子病历数据的错误值与异常值缺乏规范处理方法。针对这一问题,我们提出了一种基于投影的创新方法,该方法能将临床专业知识作为领域约束无缝整合,生成可用于机器学习工作流的重要元数据。具体而言,通过构建捕捉患者生命体征与实验室数值生理-生物学约束的高维混合整数规划模型,我们能够利用电子病历数据的数学"投影"能力校正患者数据。进而,我们通过测量校正后数据与定义健康数据范围的约束条件之间的距离,构建了名为"信任分数"的独特预测指标。这些分数可揭示患者健康状况,并在真实临床场景中显著提升机器学习分类器的性能。我们以脓毒症早期检测为应用场景验证了该框架的有效性,结果表明:该方法的AUROC达0.865,精确率0.922,全面超越未采用此类投影技术的传统机器学习模型。