Regression-Based Proximal Causal Inference

In observational studies, identification of causal effects is threatened by the potential for unmeasured confounding. Negative controls have become widely used to evaluate the presence of potential unmeasured confounding thus enhancing credibility of reported causal effect estimates. Going beyond simply testing for residual confounding, proximal causal inference (PCI) was recently developed to debias causal effect estimates subject to confounding by hidden factors, by leveraging a pair of negative control variables, also known as treatment and outcome confounding proxies. While formal statistical inference has been developed for PCI, these methods can be challenging to implement in practice as they involve solving complex integral equations that are typically ill-posed. In this paper, we develop a regression-based PCI approach, employing a two-stage regression via familiar generalized linear models to implement the PCI framework, which completely obviates the need to solve difficult integral equations. In the first stage, one fits a generalized linear model (GLM) for the outcome confounding proxy in terms of the treatment confounding proxy and the primary treatment. In the second stage, one fits a GLM for the primary outcome in terms of the primary treatment, using the predicted value of the first-stage regression model as a regressor which as we establish accounts for any residual confounding for which the proxies are relevant. The proposed approach has merit in that (i) it is applicable to continuous, count, and binary outcomes cases, making it relevant to a wide range of real-world applications, and (ii) it is easy to implement using off-the-shelf software for GLMs. We establish the statistical properties of regression-based PCI and illustrate their performance in both synthetic and real-world empirical applications.

翻译：在观察性研究中，因果效应的识别受到未测量混杂因素的潜在威胁。阴性对照已被广泛用于评估潜在未测量混杂因素的存在，从而增强所报告因果效应估计的可信度。超越单纯检验残余混杂，近端因果推断（PCI）近期被提出，通过利用一对阴性对照变量（也称为治疗和结局混杂代理变量），来消除受隐藏因素混杂影响的因果效应估计偏差。尽管PCI已发展出正式的统计推断方法，但这些方法在实践中实施颇具挑战性，因为它们涉及求解通常病态的复杂积分方程。本文提出了一种基于回归的PCI方法，通过熟悉的广义线性模型采用两阶段回归来实现PCI框架，从而完全避免了求解困难积分方程的需求。第一阶段，以治疗混杂代理变量和主要治疗为自变量，拟合结局混杂代理变量的广义线性模型（GLM）。第二阶段，以主要治疗为自变量拟合主要结局的GLM，并将第一阶段回归模型的预测值作为回归量纳入模型——我们证实该回归量能够解释代理变量相关的任何残余混杂。该方法的优势在于：（i）适用于连续型、计数型和二元型结局案例，使其广泛适用于现实世界应用；（ii）可利用现成的GLM软件轻松实现。我们建立了基于回归的PCI的统计性质，并在合成数据与真实世界实证应用中展示了其性能。