Randomized controlled trials (RCTs) are the gold standard for causal inference, but they are often powered only for average effects, making estimation of heterogeneous treatment effects (HTEs) challenging. Conversely, large-scale observational studies (OS) offer a wealth of data but suffer from confounding bias. Our paper presents a novel framework to leverage OS data for enhancing the efficiency in estimating conditional average treatment effects (CATEs) from RCTs while mitigating common biases. We propose an innovative approach to combine RCTs and OS data, expanding the traditionally used control arms from external sources. The framework relaxes the typical assumption of CATE invariance across populations, acknowledging the often unaccounted systematic differences between RCT and OS participants. We demonstrate this through the special case of a linear outcome model, where the CATE is sparsely different between the two populations. The core of our framework relies on learning potential outcome means from OS data and using them as a nuisance parameter in CATE estimation from RCT data. We further illustrate through experiments that using OS findings reduces the variance of the estimated CATE from RCTs and can decrease the required sample size for detecting HTEs.
翻译:随机对照试验是因果推断的黄金标准,但通常仅针对平均效应进行统计效力设计,导致对异质性处理效应的估计面临挑战。相比之下,大规模观察性研究虽能提供丰富数据,却存在混杂偏倚问题。本文提出一种新颖框架,通过利用观察性研究数据来提升随机对照试验中条件平均处理效应估计效率,同时减轻常见偏倚。我们提出创新性方法将随机对照试验与观察性研究数据相结合,扩展了传统上仅依赖外部来源的对照组数据。该框架放宽了跨群体条件平均处理效应不变性的典型假设,承认了随机对照试验与观察性研究对象间常被忽视的系统性差异。我们通过线性结果模型的特例进行论证,在此模型中两群体间的条件平均处理效应呈现稀疏差异。该框架的核心在于从观察性研究数据中学习潜在结果均值,并将其作为随机对照试验数据中条件平均处理效应估计的 nuisance 参数。实验进一步表明,利用观察性研究结果可降低随机对照试验估计条件平均处理效应的方差,并减少检测异质性处理效应所需的样本量。