Predictive models -- as with machine learning -- can underpin causal inference, to estimate the effects of an intervention at the population or individual level. This opens the door to a plethora of models, useful to match the increasing complexity of health data, but also the Pandora box of model selection: which of these models yield the most valid causal estimates? Classic machine-learning cross-validation procedures are not directly applicable. Indeed, an appropriate selection procedure for causal inference should equally weight both outcome errors for each individual, treated or not treated, whereas one outcome may be seldom observed for a sub-population. We study how more elaborate risks benefit causal model selection. We show theoretically that simple risks are brittle to weak overlap between treated and non-treated individuals as well as to heterogeneous errors between populations. Rather a more elaborate metric, the R-risk appears as a proxy of the oracle error on causal estimates, observable at the cost of an overlap re-weighting. As the R-risk is defined not only from model predictions but also by using the conditional mean outcome and the treatment probability, using it for model selection requires adapting cross validation. Extensive experiments show that the resulting procedure gives the best causal model selection.
翻译:预测模型——如同机器学习——可以为因果推断提供基础,用于估计干预在群体或个体层面的效果。这为大量模型打开了大门,有助于应对健康数据日益增长的复杂性,但也带来了模型选择的“潘多拉魔盒”:这些模型中哪些能产生最有效的因果估计?经典的机器学习交叉验证程序并非直接适用。实际上,一个适用于因果推断的恰当选择程序应同等权重每个个体(无论是否接受处理)的结果误差,而其中一个结果可能对于某一亚群而言很少被观测到。我们研究了更复杂的风险如何有益于因果模型选择。我们从理论上证明,简单风险在处理组与未处理组之间弱重叠以及群体间异质性误差情况下较为脆弱。相反,一种更复杂的度量——R风险——可作为因果估计的预言误差的近似指标,其可通过重叠重加权观测得到。由于R风险不仅根据模型预测定义,还通过使用条件均值结果和处理概率来定义,因此将其用于模型选择需要调整交叉验证。大量实验表明,所得程序提供了最佳的因果模型选择。