Objective: Prediction models are popular in medical research and practice. By predicting an outcome of interest for specific patients, these models may help inform difficult treatment decisions, and are often hailed as the poster children for personalized, data-driven healthcare. Many prediction models are deployed for decision support based on their prediction accuracy in validation studies. We investigate whether this is a safe and valid approach. Materials and Methods: We show that using prediction models for decision making can lead to harmful decisions, even when the predictions exhibit good discrimination after deployment. These models are harmful self-fulfilling prophecies: their deployment harms a group of patients but the worse outcome of these patients does not invalidate the predictive power of the model. Results: Our main result is a formal characterization of a set of such prediction models. Next we show that models that are well calibrated before and after deployment are useless for decision making as they made no change in the data distribution. Discussion: Our results point to the need to revise standard practices for validation, deployment and evaluation of prediction models that are used in medical decisions. Conclusion: Outcome prediction models can yield harmful self-fulfilling prophecies when used for decision making, a new perspective on prediction model development, deployment and monitoring is needed.
翻译:目的:预测模型在医学研究与实践领域应用广泛。通过预测特定患者的预期结局,这些模型有助于指导复杂的治疗决策,并常被视为个性化、数据驱动医疗的典范。许多预测模型基于验证研究中的预测精度被部署用于决策支持。我们探究这种做法的安全性与有效性。材料与方法:我们证明,即使模型在部署后仍保持良好判别能力,基于预测模型制定决策仍可能导致有害结果。此类模型属于有害的自我实现预言:其部署损害了某类患者的利益,但这些患者较差的结局并未否定模型的预测能力。结果:主要成果是对此类预测模型集合的形式化特征描述。其次表明,部署前后均保持良好校准的模型对决策毫无助益,因其未改变数据分布。讨论:我们的研究结果指出,需要修订用于医疗决策的预测模型在验证、部署和评估环节的标准实践。结论:当用于决策制定时,结局预测模型可能产生有害的自我实现预言,这要求我们从新视角审视预测模型的开发、部署与监测。