Unobserved confounding is a fundamental obstacle to establishing valid causal conclusions from observational data. Two complementary types of approaches have been developed to address this obstacle: obtaining identification using fortuitous external aids, such as instrumental variables or proxies, or by means of the ID algorithm, using Markov restrictions on the full data distribution encoded in graphical causal models. In this paper we aim to develop a synthesis of the former and latter approaches to identification in causal inference to yield the most general identification algorithm in multivariate systems currently known -- the proximal ID algorithm. In addition to being able to obtain nonparametric identification in all cases where the ID algorithm succeeds, our approach allows us to systematically exploit proxies to adjust for the presence of unobserved confounders that would have otherwise prevented identification. In addition, we outline a class of estimation strategies for causal parameters identified by our method in an important special case. We illustrate our approach by simulation studies and a data application.
翻译:未观测混杂是依据观测数据建立有效因果结论的根本性障碍。为解决这一障碍,目前发展出两类互补性方法:借助偶然性外部辅助工具(如工具变量或代理变量)实现识别,或通过基于图因果模型中编码的全数据分布马尔可夫约束的ID算法。本文旨在融合前两类识别方法,发展出现有多变量系统中最通用的识别算法——近端ID算法。该算法不仅能实现ID算法成功的所有场景下的非参数识别,还允许系统性地利用代理变量调整未观测混杂(这些混杂原本会阻碍识别)。此外,我们在一类重要特例中概述了针对本方法所识别因果参数的参数估计策略。通过模拟研究与实际数据应用,我们展示了该方法的有效性。