Prediction of mortality in intensive care unit (ICU) patients is an important task in critical care medicine. Prior work in creating mortality risk models falls into two major categories: domain-expert-created scoring systems, and black box machine learning (ML) models. Both of these have disadvantages: black box models are unacceptable for use in hospitals, whereas manual creation of models (including hand-tuning of logistic regression parameters) relies on humans to perform high-dimensional constrained optimization, which leads to a loss in performance. In this work, we bridge the gap between accurate black box models and hand-tuned interpretable models. We build on modern interpretable ML techniques to design accurate and interpretable mortality risk scores. We leverage the largest existing public ICU monitoring datasets, namely the MIMIC III and eICU datasets. By evaluating risk across medical centers, we are able to study generalization across domains. In order to customize our risk score models, we develop a new algorithm, GroupFasterRisk, which has several important benefits: (1) it uses hard sparsity constraint, allowing users to directly control the number of features; (2) it incorporates group sparsity to allow more cohesive models; (3) it allows for monotonicity correction on models for including domain knowledge; (4) it produces many equally-good models at once, which allows domain experts to choose among them. GroupFasterRisk creates its risk scores within hours, even on the large datasets we study here. GroupFasterRisk's risk scores perform better than risk scores currently used in hospitals, and have similar prediction performance to black box ML models (despite being much sparser). Because GroupFasterRisk produces a variety of risk scores and handles constraints, it allows design flexibility, which is the key enabler of practical and trustworthy model creation.
翻译:预测重症监护病房(ICU)患者的死亡率是危重症医学领域的一项重要任务。既往构建死亡风险模型的工作主要分为两类:领域专家构建的评分系统和黑盒机器学习模型。这两类方法均存在缺陷:黑盒模型在医院环境中难以被接受,而人工构建模型(包括对逻辑回归参数的手工调优)依赖人类进行高维约束优化,导致性能损失。本研究旨在弥合高精度黑盒模型与手工调优可解释模型之间的鸿沟。我们基于现代可解释机器学习技术,设计了兼具准确性与可解释性的死亡风险评分系统。通过利用现有最大的公开ICU监测数据集(即MIMIC III和eICU数据集),并跨医疗中心评估风险,我们得以研究模型在不同领域的泛化能力。为定制化风险评分模型,我们开发了新算法GroupFasterRisk,该算法具有以下重要优势:(1)采用硬稀疏约束,允许用户直接控制特征数量;(2)整合组稀疏性以实现更具凝聚力的模型;(3)支持单调性校正以融入领域知识;(4)可同时生成多个性能相近的模型,供领域专家选择。即使在我们研究的大规模数据集上,GroupFasterRisk也能在数小时内完成风险评分构建。其生成的风险评分不仅优于医院当前使用的评分系统,且(在稀疏性显著更高的条件下)预测性能与黑盒机器学习模型相当。由于GroupFasterRisk能生成多样化的风险评分并处理约束条件,它为模型设计提供了灵活性,这是实现实用且可信赖模型构建的关键要素。