Although understanding and characterizing causal effects have become essential in observational studies, it is challenging when the confounders are high-dimensional. In this article, we develop a general framework $\textit{CausalEGM}$ for estimating causal effects by encoding generative modeling, which can be applied in both binary and continuous treatment settings. Under the potential outcome framework with unconfoundedness, we establish a bidirectional transformation between the high-dimensional confounders space and a low-dimensional latent space where the density is known (e.g., multivariate normal distribution). Through this, CausalEGM simultaneously decouples the dependencies of confounders on both treatment and outcome and maps the confounders to the low-dimensional latent space. By conditioning on the low-dimensional latent features, CausalEGM can estimate the causal effect for each individual or the average causal effect within a population. Our theoretical analysis shows that the excess risk for CausalEGM can be bounded through empirical process theory. Under an assumption on encoder-decoder networks, the consistency of the estimate can be guaranteed. In a series of experiments, CausalEGM demonstrates superior performance over existing methods for both binary and continuous treatments. Specifically, we find CausalEGM to be substantially more powerful than competing methods in the presence of large sample sizes and high dimensional confounders. The software of CausalEGM is freely available at https://github.com/SUwonglab/CausalEGM.
翻译:尽管理解和刻画因果效应在观察性研究中已变得至关重要,但当混杂变量为高维时,这一任务极具挑战性。本文提出了一种通用框架 $\textit{CausalEGM}$,通过编码生成建模估计因果效应,可同时适用于二值处理和连续处理场景。在无混淆假设下的潜结果框架中,我们在高维混杂变量空间与密度已知(如多元正态分布)的低维潜空间之间建立了双向变换。通过这一变换,CausalEGM同时解耦了混杂变量对处理变量和结果变量的依赖关系,并将混杂变量映射至低维潜空间。通过以低维潜特征为条件,CausalEGM可估计每个个体的因果效应或群体平均因果效应。理论分析表明,基于经验过程理论,CausalEGM的超额风险可被界定量化。在编码器-解码器网络的假设下,估计的一致性得以保证。通过一系列实验,CausalEGM在二值处理和连续处理场景中均展现出优于现有方法的性能。具体而言,当样本量较大且混杂变量维度较高时,CausalEGM的效能显著强于竞争方法。CausalEGM的软件可在 https://github.com/SUwonglab/CausalEGM 自由获取。