Unmeasured confounding presents a common challenge in observational studies, potentially making standard causal parameters unidentifiable without additional assumptions. Given the increasing availability of diverse data sources, exploiting data linkage offers a potential solution to mitigate unmeasured confounding within a primary study of interest. However, this approach often introduces selection bias, as data linkage is feasible only for a subset of the study population. To address this concern, we explore three nonparametric identification strategies under the assumption that a unit' s inclusion in the linked cohort is determined solely by the observed confounders, while acknowledging that the ignorability assumption may depend on some partially unobserved covariates. The existence of multiple identification strategies motivates the development of estimators that effectively capture distinct components of the observed data distribution. Appropriately combining these estimators yields triply robust estimators for the average treatment effect. These estimators remain consistent if at least one of the three distinct parts of the observed data law is correct. Moreover, they are locally efficient if all the models are correctly specified. We evaluate the proposed estimators using simulation studies and real data analysis.
翻译:未测量的混杂因素是观察性研究中常见的挑战,在没有额外假设的情况下,可能导致标准的因果参数无法识别。鉴于多种数据来源的日益可用,利用数据链接为解决主要研究中的未测量混杂因素提供了潜在方案。然而,这种方法通常引入选择偏差,因为数据链接仅对研究人群的一个子集可行。为解决这一问题,我们探究了在单位被纳入链接队列仅由观察到的混杂因素决定的假设下的三种非参数识别策略,同时承认可忽略性假设可能依赖于部分未观测到的协变量。多种识别策略的存在促使我们开发能够有效捕捉观察数据分布不同组成部分的估计量。适当组合这些估计量可得到平均处理效应的三重稳健估计量。如果观察数据分布的三个不同部分中至少有一个是正确的,这些估计量就保持一致。此外,如果所有模型都正确指定,它们具有局部有效性。我们通过模拟研究和实际数据分析评估了所提出的估计量。