Unmeasured confounding bias is among the largest threats to the validity of observational studies. Although sensitivity analyses and various study designs have been proposed to address this issue, they do not leverage the growing availability of auxiliary data accessible through open data platforms. Using negative controls has been introduced in the causal inference literature as a promising approach to account for unmeasured confounding bias. In this paper, we develop a Bayesian nonparametric method to estimate a causal exposure-response function (CERF). This estimation method effectively utilizes auxiliary information from negative control variables to adjust for unmeasured confounding completely. We model the CERF as a mixture of linear models. This strategy offers the dual advantage of capturing the potential nonlinear shape of CERFs while maintaining computational efficiency. Additionally, it leverages closed-form results that hold under the linear model assumption. We assess the performance of our method through simulation studies. The results demonstrate the method's ability to accurately recover the true shape of the CERF in the presence of unmeasured confounding. To showcase the practical utility of our approach, we apply it to adjust for a potential unmeasured confounder when evaluating the relationship between long-term exposure to ambient $PM_{2.5}$ and cardiovascular hospitalization rates among the elderly in the continental U.S. We implement our estimation procedure in open-source software to ensure transparency and reproducibility and make our code publicly available.
翻译:未测量的混杂偏倚是观察性研究有效性的最大威胁之一。尽管已提出敏感性分析和各种研究设计来解决这一问题,但它们并未充分利用通过开放数据平台日益可获得的辅助数据。在因果推断文献中,使用负对照已被引入作为一种有前景的方法,用于解释未测量的混杂偏倚。本文开发了一种贝叶斯非参数方法,用于估计因果暴露-响应函数(CERF)。该估计方法有效利用来自负对照变量的辅助信息,以完全调整未测量的混杂。我们将CERF建模为线性模型的混合。这一策略具有双重优势:既能捕捉CERF潜在的非线性形状,又能保持计算效率。此外,它利用了在线性模型假设下成立的闭式解结果。我们通过模拟研究评估了该方法的表现。结果表明,该方法能够在存在未测量混杂的情况下准确恢复CERF的真实形状。为展示我们方法的实际效用,我们将其应用于评估美国大陆老年人长期暴露于环境$PM_{2.5}$与心血管住院率之间关系时,调整一个潜在未测量混杂因素。我们以开源软件实现估计过程,以确保透明度和可重复性,并公开我们的代码。