Unobserved confounding is one of the main challenges when estimating causal effects. We propose a causal reduction method that, given a causal model, replaces an arbitrary number of possibly high-dimensional latent confounders with a single latent confounder that takes values in the same space as the treatment variable, without changing the observational and interventional distributions the causal model entails. This allows us to estimate the causal effect in a principled way from combined data without relying on the common but often unrealistic assumption that all confounders have been observed. We apply our causal reduction in three different settings. In the first setting, we assume the treatment and outcome to be discrete. The causal reduction then implies bounds between the observational and interventional distributions that can be exploited for estimation purposes. In certain cases with highly unbalanced observational samples, the accuracy of the causal effect estimate can be improved by incorporating observational data. Second, for continuous variables and assuming a linear-Gaussian model, we derive equality constraints for the parameters of the observational and interventional distributions. Third, for the general continuous setting (possibly nonlinear and non-Gaussian), we parameterize the reduced causal model using normalizing flows, a flexible class of easily invertible nonlinear transformations. We perform a series of experiments on synthetic data and find that in several cases the number of interventional samples can be reduced when adding observational training samples without sacrificing accuracy.
翻译:摘要:未观测到的混杂是估计因果效应时的主要挑战之一。我们提出一种因果约简方法,该方法在给定因果模型的前提下,将任意数量的可能高维潜在混杂因子替换为单个取值空间与处理变量相同的潜在混杂因子,且不改变因果模型所蕴含的观测分布和干预分布。这使得我们能够以原则性的方式从组合数据中估计因果效应,而无需依赖常见但往往不切实际的假设——即所有混杂因子均已被观测到。我们将该因果约简应用于三种不同的场景。第一种场景假设处理变量和结果变量为离散变量。此时,因果约简引入了观测分布与干预分布之间的界限关系,可用于估计目的。在某些观测样本严重不平衡的情况下,通过纳入观测数据可提升因果效应估计的准确性。第二种场景针对连续变量并假设线性高斯模型,我们推导出观测分布与干预分布参数间的等式约束。第三种场景适用于一般连续设置(可能为非线性和非高斯),我们利用归一化流(一种灵活且易于可逆的非线性变换函数族)对约简后的因果模型进行参数化。我们在合成数据上进行了一系列实验,发现在多种情况下,若加入观测训练样本,可在不牺牲精度前提下减少干预样本数量。