We study the problem of model selection in causal inference, specifically for conditional average treatment effect (CATE) estimation. Unlike machine learning, there is no perfect analogue of cross-validation for model selection as we do not observe the counterfactual potential outcomes. Towards this, a variety of surrogate metrics have been proposed for CATE model selection that use only observed data. However, we do not have a good understanding regarding their effectiveness due to limited comparisons in prior studies. We conduct an extensive empirical analysis to benchmark the surrogate model selection metrics introduced in the literature, as well as the novel ones introduced in this work. We ensure a fair comparison by tuning the hyperparameters associated with these metrics via AutoML, and provide more detailed trends by incorporating realistic datasets via generative modeling. Our analysis suggests novel model selection strategies based on careful hyperparameter selection of CATE estimators and causal ensembling.
翻译:我们研究因果推断中的模型选择问题,特别关注条件平均处理效应(CATE)估计。与机器学习不同,由于无法观测到反事实潜在结果,模型选择中不存在完美的交叉验证替代方法。为此,已有多种仅基于观测数据的代理指标被提出用于CATE模型选择。然而,由于先前研究中的比较有限,我们对其有效性缺乏充分理解。我们通过广泛的实证分析,对文献中提出的代理模型选择指标以及本文引入的新指标进行基准测试。通过AutoML调整这些指标相关的超参数确保公平比较,并利用生成建模纳入真实数据集以提供更详细的趋势。我们的分析基于CATE估计器的精细超参数选择和因果集成,提出了新颖的模型选择策略。