Detecting latent confounders from proxy variables is an essential problem in causal effect estimation. Previous approaches are limited to low-dimensional proxies, sorted proxies, and binary treatments. We remove these assumptions and present a novel Proxy Confounder Factorization (PCF) framework for continuous treatment effect estimation when latent confounders manifest through high-dimensional, mixed proxy variables. For specific sample sizes, our two-step PCF implementation, using Independent Component Analysis (ICA-PCF), and the end-to-end implementation, using Gradient Descent (GD-PCF), achieve high correlation with the latent confounder and low absolute error in causal effect estimation with synthetic datasets in the high sample size regime. Even when faced with climate data, ICA-PCF recovers four components that explain $75.9\%$ of the variance in the North Atlantic Oscillation, a known confounder of precipitation patterns in Europe. Code for our PCF implementations and experiments can be found here: https://github.com/IPL-UV/confound_it. The proposed methodology constitutes a stepping stone towards discovering latent confounders and can be applied to many problems in disciplines dealing with high-dimensional observed proxies, e.g., spatiotemporal fields.
翻译:从代理变量中检测潜在混杂因子是因果效应估计中的一个基本问题。先前的方法局限于低维代理变量、排序代理变量和二元处理变量。我们取消了这些假设,并提出了一种新颖的代理混杂因子分解(PCF)框架,用于在潜在混杂因子通过高维混合代理变量表现时进行连续处理效应估计。针对特定样本量,我们的两步PCF实现(采用独立成分分析,即ICA-PCF)以及端到端实现(采用梯度下降,即GD-PCF)在大样本量设置下,利用合成数据集实现了与潜在混杂因子的高相关性以及因果效应估计的低绝对误差。即使在面对气候数据时,ICA-PCF恢复出的四个成分能够解释北大西洋涛动(一种已知的欧洲降水模式混杂因子)中$75.9\%$的方差。我们PCF实现和实验的代码可在此处获取:https://github.com/IPL-UV/confound_it。所提出的方法论构成了发现潜在混杂因子的基石,并可应用于许多处理高维观测代理变量(如时空场)的学科问题中。