Latent confounders are a fundamental challenge for inferring causal effects from observational data. The instrumental variable (IV) approach is a practical way to address this challenge. Existing IV based estimators need a known IV or other strong assumptions, such as the existence of two or more IVs in the system, which limits the application of the IV approach. In this paper, we consider a relaxed requirement, which assumes there is an IV proxy in the system without knowing which variable is the proxy. We propose a Variational AutoEncoder (VAE) based disentangled representation learning method to learn an IV representation from a dataset with latent confounders and then utilise the IV representation to obtain an unbiased estimation of the causal effect from the data. Extensive experiments on synthetic and real-world data have demonstrated that the proposed algorithm outperforms the existing IV based estimators and VAE-based estimators.
翻译:潜在混杂因子是从观测数据推断因果效应的根本挑战。工具变量(IV)方法是应对这一挑战的实用途径。现有基于工具变量的估计方法需要已知的工具变量或其他强假设(例如系统中存在两个或更多工具变量),这限制了工具变量方法的应用。本文考虑一种放宽的要求,即假设系统中存在一个工具变量代理,但无需知晓具体哪个变量是该代理。我们提出一种基于变分自编码器(VAE)的解耦表征学习方法,从包含潜在混杂因子的数据集中学习工具变量表征,进而利用该工具变量表征从数据中获得因果效应的无偏估计。在合成数据和真实数据上的大量实验表明,所提算法优于现有的基于工具变量的估计方法及基于VAE的估计方法。