How to select predictive models for causal inference?

Predictive models -- as with machine learning -- can underpin causal inference, to estimate the effects of an intervention at the population or individual level. This opens the door to a plethora of models, useful to match the increasing complexity of health data, but also the Pandora box of model selection: which of these models yield the most valid causal estimates? Classic machine-learning cross-validation procedures are not directly applicable. Indeed, an appropriate selection procedure for causal inference should equally weight both outcome errors for each individual, treated or not treated, whereas one outcome may be seldom observed for a sub-population. We study how more elaborate risks benefit causal model selection. We show theoretically that simple risks are brittle to weak overlap between treated and non-treated individuals as well as to heterogeneous errors between populations. Rather a more elaborate metric, the R-risk appears as a proxy of the oracle error on causal estimates, observable at the cost of an overlap re-weighting. As the R-risk is defined not only from model predictions but also by using the conditional mean outcome and the treatment probability, using it for model selection requires adapting cross validation. Extensive experiments show that the resulting procedure gives the best causal model selection.

翻译：预测模型——如同机器学习——可以为因果推断提供基础，用于估计干预在群体或个体层面的效果。这为大量模型打开了大门，有助于应对健康数据日益增长的复杂性，但也带来了模型选择的“潘多拉魔盒”：这些模型中哪些能产生最有效的因果估计？经典的机器学习交叉验证程序并非直接适用。实际上，一个适用于因果推断的恰当选择程序应同等权重每个个体（无论是否接受处理）的结果误差，而其中一个结果可能对于某一亚群而言很少被观测到。我们研究了更复杂的风险如何有益于因果模型选择。我们从理论上证明，简单风险在处理组与未处理组之间弱重叠以及群体间异质性误差情况下较为脆弱。相反，一种更复杂的度量——R风险——可作为因果估计的预言误差的近似指标，其可通过重叠重加权观测得到。由于R风险不仅根据模型预测定义，还通过使用条件均值结果和处理概率来定义，因此将其用于模型选择需要调整交叉验证。大量实验表明，所得程序提供了最佳的因果模型选择。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

66+阅读 · 2023年2月15日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

因果图，Causal Graphs，52页ppt

专知会员服务

254+阅读 · 2020年4月19日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日