We consider the problem of estimating the causal effect of a treatment on an outcome in linear structural causal models (SCM) with latent confounders when we have access to a single proxy variable. Several methods (such as difference-in-difference (DiD) estimator or negative outcome control) have been proposed in this setting in the literature. However, these approaches require either restrictive assumptions on the data generating model or having access to at least two proxy variables. We propose a method to estimate the causal effect using cross moments between the treatment, the outcome, and the proxy variable. In particular, we show that the causal effect can be identified with simple arithmetic operations on the cross moments if the latent confounder in linear SCM is non-Gaussian. In this setting, DiD estimator provides an unbiased estimate only in the special case where the latent confounder has exactly the same direct causal effects on the outcomes in the pre-treatment and post-treatment phases. This translates to the common trend assumption in DiD, which we effectively relax. Additionally, we provide an impossibility result that shows the causal effect cannot be identified if the observational distribution over the treatment, the outcome, and the proxy is jointly Gaussian. Our experiments on both synthetic and real-world datasets showcase the effectiveness of the proposed approach in estimating the causal effect.
翻译:我们考虑在线性结构因果模型(SCM)中存在潜在混杂变量且仅能获取单一代理变量时,估计处理变量对结果变量因果效应的问题。现有文献提出了几种方法(如双重差分(DiD)估计器或负对照结果控制),但这些方法要么需要对数据生成模型施加严格假设,要么需要至少两个代理变量。我们提出一种利用处理变量、结果变量与代理变量之间交叉矩来估计因果效应的方法。具体而言,我们证明当线性SCM中的潜在混杂变量服从非高斯分布时,通过对交叉矩进行简单算术运算即可识别因果效应。在此设定下,DiD估计器仅在潜在混杂变量在处理前与处理后阶段对结果变量具有相同直接因果效应的特例中提供无偏估计,这等效于DiD中的共同趋势假设——而我们的方法有效放宽了这一假设。此外,我们提出一个不可能性结果:当处理变量、结果变量与代理变量的观测分布为联合高斯分布时,因果效应无法被识别。在合成数据集与真实数据集上的实验证明了所提方法在因果效应估计中的有效性。