Prediction under hypothetical interventions: evaluation of performance using longitudinal observational data

Prediction models provide risks of an adverse event occurring for an individual based on their characteristics. Some prediction models have been used to make treatment decisions, but this is not appropriate when the data on which the model was developed included a mix of individuals with some who did and others who did not initiate that treatment. By contrast, predictions under hypothetical interventions are estimates of what a person's risk of an outcome would be if they were to follow a particular treatment strategy, given their individual characteristics. Such predictions can give important input to medical decision making. However, evaluating predictive performance of interventional predictions is challenging. Standard ways of evaluating predictive performance do not apply, because prediction under interventions involves obtaining predictions of the outcome under conditions that differ from those that are observed for some patients in the validation data. This work describes methods for evaluating predictive performance of predictions under interventions using longitudinal observational data. We focus on time-to-event outcomes and predictions under treatment strategies that involve sustaining a particular treatment regime over time. We introduce a validation approach using artificial censoring and inverse probability weighting which involves creating a validation data set that mimics the particular treatment strategy under which predictions are made. We extend measures of calibration, discrimination and overall prediction error to the interventional prediction setting. The methods are evaluated using a simulation study and results show that our proposed approach and corresponding measures of predictive performance correctly capture the true predictive performance. The methods are applied to an example in the context of liver transplantation.

翻译：预测模型可依据个体特征提供不良事件发生风险。部分预测模型已被用于指导治疗决策，但当模型开发所依据的数据包含部分已接受治疗和部分未接受治疗的混合人群时，此类应用并不恰当。与之相对，干预假设下的预测是指在给定个体特征条件下，若其遵循特定治疗策略，对其结局发生风险的估计值。此类预测可为医疗决策提供重要依据。然而，评估干预性预测的预测性能颇具挑战性。由于干预条件下的预测涉及获取与验证数据中部分患者实际观察结果不同的结局预测值，标准预测性能评估方法并不适用。本研究阐述了利用纵向观察数据评估干预条件下预测性能的方法。我们聚焦于时间-事件结局及需长期维持特定治疗方案的策略预测。引入基于人工删失和逆概率加权的验证方法，通过构建模拟预测所对应的特定治疗策略的验证数据集，将校准度、判别能力和总体预测误差等指标扩展至干预性预测场景。通过模拟研究验证方法有效性，结果表明本研究所提方法及相应预测性能指标能正确捕捉真实预测性能。最后将该方法应用于肝移植领域实例分析。