Bridging the gap between internal and external validity is crucial for heterogeneous treatment effect estimation. Randomised controlled trials (RCTs), favoured for their internal validity due to randomisation, often encounter challenges in generalising findings due to strict eligibility criteria. Observational studies, on the other hand, may provide stronger external validity through larger and more representative samples but can suffer from compromised internal validity due to unmeasured confounding. Motivated by these complementary characteristics, we propose a novel Bayesian nonparametric approach, Causal-ICM, leveraging multi-task Gaussian processes to integrate data from both RCTs and observational studies. In particular, we introduce a parameter that controls the degree of borrowing between the datasets and prevents the observational dataset from dominating the estimation. We propose a data-adaptive procedure for choosing the optimal value of the parameter. Causal-ICM outperforms other data fusion methods in point estimation across the covariate support of the observational study and provides principled uncertainty quantification for the estimated treatment effects. We demonstrate the robust performance of Causal-ICM in diverse scenarios through multiple simulation studies and a real-world study.
翻译:摘要:弥合内部效度与外部效度之间的差距对于异质性处理效应估计至关重要。随机对照试验凭借随机化设计在内部效度上具有优势,但严格的纳入标准常导致其结论难以推广。相比之下,观察性研究通过更大规模、更具代表性的样本可能提供更强的外部效度,却可能因未测量的混杂因素而损害内部效度。基于两者互补特性,我们提出一种新颖的贝叶斯非参数方法——Causal-ICM,利用多任务高斯过程整合随机对照试验与观察性研究的数据。具体而言,我们引入一个控制数据集间信息借用程度的参数,防止观察性数据集主导估计过程,并设计数据自适应流程以选择该参数的最优值。在观察性研究的协变量支持域内,Causal-ICM在点估计上优于其他数据融合方法,同时为估计的处理效应提供原则性的不确定性量化。通过多项模拟研究及一项真实世界研究,我们验证了Causal-ICM在不同场景下的稳健性能。