Regression-Based Proximal Causal Inference

In observational studies, identification of causal effects is threatened by the potential for unmeasured confounding. Negative controls have become widely used to evaluate the presence of potential unmeasured confounding thus enhancing credibility of reported causal effect estimates. Going beyond simply testing for residual confounding, proximal causal inference (PCI) was recently developed to debias causal effect estimates subject to confounding by hidden factors, by leveraging a pair of negative control variables, also known as treatment and outcome confounding proxies. While formal statistical inference has been developed for PCI, these methods can be challenging to implement in practice as they involve solving complex integral equations that are typically ill-posed. In this paper, we develop a regression-based PCI approach, employing a two-stage regression via familiar generalized linear models to implement the PCI framework, which completely obviates the need to solve difficult integral equations. In the first stage, one fits a generalized linear model (GLM) for the outcome confounding proxy in terms of the treatment confounding proxy and the primary treatment. In the second stage, one fits a GLM for the primary outcome in terms of the primary treatment, using the predicted value of the first-stage regression model as a regressor which as we establish accounts for any residual confounding for which the proxies are relevant. The proposed approach has merit in that (i) it is applicable to continuous, count, and binary outcomes cases, making it relevant to a wide range of real-world applications, and (ii) it is easy to implement using off-the-shelf software for GLMs. We establish the statistical properties of regression-based PCI and illustrate their performance in both synthetic and real-world empirical applications.

翻译：在观察性研究中，因果效应的识别受到未测量混杂因素的威胁。阴性对照已被广泛用于评估潜在未测量混杂因素的存在，从而增强报告因果效应估计的可信度。近端因果推断（PCI）超越了单纯检验残差混杂，通过利用一对阴性对照变量（亦称治疗和结局混杂代理变量），最近被开发用于消除受隐藏因素混杂影响的因果效应偏差。尽管已为PCI建立了形式化统计推断方法，但这些方法在实践中实施颇具挑战性，因为它们涉及求解通常不适定的复杂积分方程。本文提出了一种基于回归的PCI方法，通过采用熟悉广义线性模型的两阶段回归来实现PCI框架，完全避免了求解困难积分方程的需求。第一阶段：针对结局混杂代理变量，以治疗混杂代理变量和主要治疗为自变量拟合广义线性模型（GLM）。第二阶段：针对主要结局，以主要治疗为自变量拟合GLM，并使用第一阶段回归模型的预测值作为回归变量（我们已证实该预测值可解释与代理变量相关的任何残差混杂）。所提方法具有以下优点：（i）适用于连续型、计数型和二元型结局情况，使其适用于广泛的实际应用场景；（ii）可通过现成的GLM软件轻松实现。我们建立了基于回归的PCI的统计性质，并在合成数据和真实世界实证应用中展示了其性能。