Prediction models are popular in medical research and practice. By predicting an outcome of interest for specific patients, these models may help inform difficult treatment decisions, and are often hailed as the poster children for personalized, data-driven healthcare. We show however, that using prediction models for decision making can lead to harmful decisions, even when the predictions exhibit good discrimination after deployment. These models are harmful self-fulfilling prophecies: their deployment harms a group of patients but the worse outcome of these patients does not invalidate the predictive power of the model. Our main result is a formal characterization of a set of such prediction models. Next we show that models that are well calibrated before} and after deployment are useless for decision making as they made no change in the data distribution. These results point to the need to revise standard practices for validation, deployment and evaluation of prediction models that are used in medical decisions.
翻译:预测模型在医学研究和实践中广泛应用。通过预测特定患者的临床结局,这些模型有助于指导艰难的治疗决策,并常被视为个性化数据驱动医疗的典范。然而我们证明,即便在部署后仍保持良好鉴别力的预测模型,用于决策也可能导致有害后果。这些模型是有害的自我实现预言:其部署损害了某类患者群体,但这类患者更差的结果并不削弱模型的预测能力。我们的核心成果是对此类预测模型集合的形式化刻画。进一步研究表明,部署前后均保持良好校准的模型对决策毫无价值,因其未改变数据分布。这些发现表明,需重新审视用于医疗决策的预测模型在验证、部署及评估中的标准实践。