We consider a patient risk models which has access to patient features such as vital signs, lab values, and prior history but does not have access to a patient's diagnosis. For example, this occurs in a model deployed at intake time for triage purposes. We show that such `all-cause' risk models have good generalization across diagnoses but have a predictable failure mode. When the same lab/vital/history profiles can result from diagnoses with different risk profiles (e.g. E.coli vs. MRSA) the risk estimate is a probability weighted average of these two profiles. This leads to an under-estimation of risk for rare but highly risky diagnoses. We propose a fix for this problem by explicitly modeling the uncertainty in risk prediction coming from uncertainty in patient diagnoses. This gives practitioners an interpretable way to understand patient risk beyond a single risk number.
翻译:我们考虑一种患者风险模型,该模型能够获取患者特征(如生命体征、实验室指标和既往病史),但无法获取患者的诊断信息。例如,在分诊环节部署的模型就存在这种情况。研究表明,这类"全因"风险模型在跨诊断场景下具有良好的泛化能力,但存在可预测的失效模式。当相同的实验室/生命体征/病史特征组合可能源于不同风险特征的诊断(如大肠杆菌vs耐甲氧西林金黄色葡萄球菌)时,风险估计值将是这两种诊断特征的概率加权平均值。这会导致对罕见但高风险诊断的风险低估。针对该问题,我们提出通过显式建模患者诊断不确定性所引发的风险预测不确定性来改进模型。该方法为临床医生提供了一种超越单一风险数值的可解释性手段,以理解患者风险。