Counterfactual prediction methods are required when a model will be deployed in a setting where treatment policies differ from the setting where the model was developed, or when the prediction question is explicitly counterfactual. However, estimating and evaluating counterfactual prediction models is challenging because one does not observe the full set of potential outcomes for all individuals. Here, we discuss how to tailor a model to a counterfactual estimand, how to assess the model's performance, and how to perform model and tuning parameter selection. We also provide identifiability results for measures of performance for a potentially misspecified counterfactual prediction model based on training and test data from the same (factual) source population. Last, we illustrate the methods using simulation and apply them to the task of developing a statin-na\"{i}ve risk prediction model for cardiovascular disease.
翻译:当模型将在与开发环境不同的治疗策略条件下部署,或预测问题本身具有明确的反事实性质时,需要采用反事实预测方法。然而,反事实预测模型的估计与评估面临显著挑战,因为研究者无法观测到所有个体的完整潜在结果集。本文讨论如何针对反事实估计量定制模型、评估模型性能,以及执行模型与调参选择。我们基于来自相同(事实性)源总体的训练数据和测试数据,为可能设定错误的反事实预测模型提供了性能指标的可识别性结果。最后,通过模拟实验展示方法应用,并将其用于开发他汀类药物初治人群的心血管疾病风险预测模型。