Causal inference is vital for informed decision-making across fields such as biomedical research and social sciences. Randomized controlled trials (RCTs) are considered the gold standard for the internal validity of inferences, whereas observational studies (OSs) often provide the opportunity for greater external validity. However, both data sources have inherent limitations preventing their use for broadly valid statistical inferences: RCTs may lack generalizability due to their selective eligibility criterion, and OSs are vulnerable to unobserved confounding. This paper proposes an innovative approach to integrate RCT and OS that borrows the other study's strengths to remedy each study's limitations. The method uses a novel triplet matching algorithm to align RCT and OS samples and a new two-parameter sensitivity analysis framework to quantify internal and external biases. This combined approach yields causal estimates that are more robust to hidden biases than OSs alone and provides reliable inferences about the treatment effect in the general population. We apply this method to investigate the effects of lactation on maternal health using a small RCT and a long-term observational health records dataset from the California National Primate Research Center. This application demonstrates the practical utility of our approach in generating scientifically sound and actionable causal estimates.
翻译:因果推断对于生物医学研究和社会科学等领域的知情决策至关重要。随机对照试验(RCT)因其内部效度而被视为推断的黄金标准,而观察性研究(OS)则通常能提供更高的外部效度。然而,这两种数据源均存在固有的局限性,使其难以用于广泛有效的统计推断:RCT可能因选择性入组标准而缺乏普遍性,OS则易受未观测混杂因素的影响。本文提出一种整合RCT与OS的创新方法,通过借鉴对方研究的优势来弥补各自的局限。该方法采用新颖的三元组匹配算法对齐RCT与OS样本,并构建新的双参数敏感性分析框架以量化内部与外部偏倚。这种组合方法产生的因果估计比单独使用OS对隐藏偏倚更具鲁棒性,并能提供关于总体人群治疗效应的可靠推断。我们将此方法应用于加州国家灵长类动物研究中心的短期RCT与长期观察性健康记录数据集,以研究哺乳对母体健康的影响。该应用证明了我们的方法在生成科学可靠且可操作的因果估计方面具有实际效用。