In response to the growing need for generating real-world evidence from multi-site collaborative studies, we introduce an efficient collaborative learning approach to evaluate average treatment effect (ECO-ATE) in a multi-site setting under data sharing constraints. Specifically, ECO-ATE operates in a federated manner, using individual-level data from a user-defined target population and summary statistics from other source populations, to construct efficient estimator for the average treatment effect on the target population of interest. Our federated approach does not require iterative communications between sites, making it particularly suitable for research consortia with limited resources for developing automated data-sharing infrastructures. Compared to existing work data integration methods in causal inference, ECO-ATE allows distributional shifts in outcomes, treatments and baseline covariates distributions, and achieves semiparametric efficiency bound under appropriate conditions. We conduct simulation studies to demonstrate the extent of efficiency gains achieved by incorporating additional data sources, as well as the robustness of our approach against varying levels of distributional shifts and overparameterization, compared to existing benchmarks. We apply ECO-ATE to a case study examining the effect of insulin vs. non-insulin treatments on heart failure for patients with type II diabetes using electronic health record data collected from the All of Us program.
翻译:针对多中心协作研究中日益增长的真实世界证据生成需求,我们提出了一种高效协同学习方法(ECO-ATE),用于在数据共享限制下评估多站点环境中的平均处理效应。具体而言,ECO-ATE以联邦方式运行,利用用户定义目标人群的个体级数据和其他源人群的汇总统计量,为目标人群构建平均处理效应的有效估计量。我们的联邦方法不需要站点间迭代通信,特别适用于开发自动化数据共享基础设施资源有限的研究联盟。与现有因果推断中的工作数据整合方法相比,ECO-ATE允许结果、治疗和基线协变量分布发生偏移,并在适当条件下达到半参数效率界。我们通过模拟研究展示了通过整合额外数据源实现的效率提升程度,以及我们的方法在分布偏移和过度参数化程度变化下相对于现有基准的稳健性。我们将ECO-ATE应用于一项案例研究,利用从全民计划(All of Us)项目中收集的电子健康记录数据,检验胰岛素与非胰岛素治疗对2型糖尿病患者心力衰竭的影响。