Counterfactual prediction methods are required when a model will be deployed in a setting where treatment policies differ from the setting where the model was developed, or when the prediction question is explicitly counterfactual. However, estimating and evaluating counterfactual prediction models is challenging because one does not observe the full set of potential outcomes for all individuals. Here, we discuss how to tailor a model to a counterfactual estimand, how to assess the model's performance, and how to perform model and tuning parameter selection. We also provide identifiability results for measures of performance for a potentially misspecified counterfactual prediction model based on training and test data from the same (factual) source population. Last, we illustrate the methods using simulation and apply them to the task of developing a statin-na\"{i}ve risk prediction model for cardiovascular disease.
翻译:当模型将在治疗策略与开发环境不同的场景中部署,或预测问题明确涉及反事实推断时,需要采用反事实预测方法。然而,由于无法观测所有个体完整的潜在结果集合,反事实预测模型的估计与评估面临显著挑战。本文探讨了如何根据反事实估计目标定制模型、评估模型性能以及执行模型与调参选择。我们基于同一(事实)源人群的训练集和测试集,为潜在设定错误的反事实预测模型提供了性能指标的可识别性结果。最后,通过模拟研究验证所提方法,并将其应用于开发他汀类药物未使用者心血管疾病风险预测模型的任务。