Prediction models are popular in medical research and practice. By predicting an outcome of interest for specific patients, these models may help inform difficult treatment decisions, and are often hailed as the poster children for personalized, data-driven healthcare. We show however, that using prediction models for decision making can lead to harmful decisions, even when the predictions exhibit good discrimination after deployment. These models are harmful self-fulfilling prophecies: their deployment harms a group of patients but the worse outcome of these patients does not invalidate the predictive power of the model. Our main result is a formal characterization of a set of such prediction models. Next we show that models that are well calibrated before and after deployment are useless for decision making as they made no change in the data distribution. These results point to the need to revise standard practices for validation, deployment and evaluation of prediction models that are used in medical decisions.
翻译:预测模型在医学研究和实践中广受欢迎。通过为特定患者预测其关注结局,这些模型有助于指导艰难的治疗决策,并常被誉为个性化、数据驱动医疗的典范。然而,我们证明:将预测模型用于决策可能产生有害决策——即使模型在部署后仍展现出良好的区分能力。这类模型是有害的自我实现预言:其部署损害了某类患者群体,但患者更差预后并不会否定模型的预测效能。本研究核心成果是对此类预测模型的形式化刻画。进一步研究表明,部署前后均保持良好校准能力的模型对决策毫无价值,因其未改变数据分布。这些发现表明,亟需修订应用于医疗决策的预测模型在验证、部署与评估环节的现有规范。