Driven by the need to generate real-world evidence from multi-site collaborative studies, we introduce an efficient collaborative learning approach to evaluate average treatment effect in a multi-site setting under data sharing constraints. Specifically, the proposed method operates in a federated manner, using individual-level data from a user-defined target population and summary statistics from other source populations, to construct efficient estimator for the average treatment effect on the target population of interest. Our federated approach does not require iterative communications between sites, making it particularly suitable for research consortia with limited resources for developing automated data-sharing infrastructures. Compared to existing work data integration methods in causal inference, it allows distributional shifts in outcomes, treatments and baseline covariates distributions, and achieves semiparametric efficiency bound under appropriate conditions. We illustrate the magnitude of efficiency gains from incorporating extra data sources by examining the effect of insulin vs. non-insulin treatments on heart failure for patients with type II diabetes using electronic health record data collected from the All of Us program.
翻译:受多中心协作研究生成真实世界证据的需求驱动,本文提出一种在数据共享约束下评估多中心场景中平均处理效应的协同学习方法。具体而言,所提方法以联邦化方式运行,利用来自用户定义目标群体的个体层面数据及其他源群体的汇总统计量,为目标群体的平均处理效应构建高效估计量。我们的联邦化方法无需站点间进行迭代通信,特别适用于数据共享自动化基础设施开发资源有限的研究联盟。与因果推断中现有的数据整合方法相比,该方法允许结果变量、处理变量及基线协变量分布存在分布偏移,并在适当条件下达到半参数效率界。通过使用"全民健康计划"收集的电子健康记录数据,分析胰岛素与非胰岛素治疗对II型糖尿病患者心力衰竭的影响,我们阐释了整合额外数据源所能实现的效率提升幅度。