Bridging the gap between internal and external validity is crucial for heterogeneous treatment effect estimation. Randomised controlled trials (RCTs), favoured for their internal validity due to randomisation, often encounter challenges in generalising findings due to strict eligibility criteria. Observational studies on the other hand, provide external validity advantages through larger and more representative samples but suffer from compromised internal validity due to unmeasured confounding. Motivated by these complementary characteristics, we propose a novel Bayesian nonparametric approach leveraging multi-task Gaussian processes to integrate data from both RCTs and observational studies. In particular, we introduce a parameter which controls the degree of borrowing between the datasets and prevents the observational dataset from dominating the estimation. The value of the parameter can be either user-set or chosen through a data-adaptive procedure. Our approach outperforms other methods in point predictions across the covariate support of the observational study, and furthermore provides a calibrated measure of uncertainty for the estimated treatment effects, which is crucial when extrapolating. We demonstrate the robust performance of our approach in diverse scenarios through multiple simulation studies and a real-world education randomised trial.
翻译:弥合内部效度与外部效度之间的差距对于异质性处理效应估计至关重要。随机对照试验因其随机化设计而具有内部效度优势,但严格的入组标准常导致其研究结果难以推广。另一方面,观测性研究通过更大规模且更具代表性的样本提供外部效度优势,但由于未测量的混杂因素而存在内部效度不足的问题。基于这两种数据源的互补特性,我们提出一种新颖的贝叶斯非参数方法,利用多任务高斯过程整合随机对照试验与观测性研究的数据。特别地,我们引入了一个控制数据集间信息借用程度的参数,该参数能防止观测数据集主导估计过程。该参数值既可由用户设定,也可通过数据自适应程序选择。我们的方法在观测性研究协变量支撑集上的点预测表现优于其他方法,并且能为估计的处理效应提供校准后的不确定性度量,这在效应外推时至关重要。我们通过多重模拟研究和一项真实世界教育随机试验,证明了该方法在多种场景下的稳健性能。