Counterfactual inference for continuous rather than binary treatment variables is more common in real-world causal inference tasks. While there are already some sample reweighting methods based on Marginal Structural Model for eliminating the confounding bias, they generally focus on removing the treatment's linear dependence on confounders and rely on the accuracy of the assumed parametric models, which are usually unverifiable. In this paper, we propose a de-confounding representation learning (DRL) framework for counterfactual outcome estimation of continuous treatment by generating the representations of covariates disentangled with the treatment variables. The DRL is a non-parametric model that eliminates both linear and nonlinear dependence between treatment and covariates. Specifically, we train the correlations between the de-confounded representations and the treatment variables against the correlations between the covariate representations and the treatment variables to eliminate confounding bias. Further, a counterfactual inference network is embedded into the framework to make the learned representations serve both de-confounding and trusted inference. Extensive experiments on synthetic datasets show that the DRL model performs superiorly in learning de-confounding representations and outperforms state-of-the-art counterfactual inference models for continuous treatment variables. In addition, we apply the DRL model to a real-world medical dataset MIMIC and demonstrate a detailed causal relationship between red cell width distribution and mortality.
翻译:在现实因果推断任务中,连续治疗变量(而非二元治疗变量)的反事实推断更为常见。现有基于边际结构模型的样本重加权方法虽能消除混杂偏差,但这类方法通常仅去除治疗变量对混杂因素的线性依赖,且依赖于假设参数模型(通常无法验证)的准确性。本文提出一种去混杂表示学习框架,通过生成与治疗变量解耦的协变量表示,实现连续治疗的反事实结果估计。该框架是一种非参数模型,能够消除治疗变量与协变量之间的线性和非线性依赖。具体而言,我们通过对抗训练使去混杂表示与治疗变量的相关性趋近于协变量表示与治疗变量的相关性,从而消除混杂偏差。此外,框架中嵌入反事实推断网络,使所学表示同时满足去混杂和可信推断的需求。在合成数据集上的大量实验表明,该框架在学习去混杂表示方面表现优异,且优于当前先进的连续治疗变量反事实推断模型。进一步,我们将该框架应用于真实医疗数据集MIMIC,揭示了红细胞分布宽度与死亡率之间的详细因果关系。