Causal inference typically assumes centralized access to individual-level data. Yet, in practice, data are often decentralized across multiple sites, making centralization infeasible due to privacy, logistical, or legal constraints. We address this problem by estimating the Average Treatment Effect (ATE) from decentralized observational data via a Federated Learning (FL) approach, allowing inference through the exchange of aggregate statistics rather than individual-level data. We propose a novel method to estimate propensity scores via a federated weighted average of local scores using Membership Weights (MW), defined as probabilities of site membership conditional on covariates. MW can be flexibly estimated with parametric or non-parametric classification models using standard FL algorithms. The resulting propensity scores are used to construct Federated Inverse Propensity Weighting (Fed-IPW) and Augmented IPW (Fed-AIPW) estimators. In contrast to meta-analysis methods, which fail when any site violates positivity, our approach exploits heterogeneity in treatment assignment across sites to improve overlap. We show that Fed-IPW and Fed-AIPW perform well under site-level heterogeneity in sample sizes, treatment mechanisms, and covariate distributions. Theoretical analysis and experiments on simulated and real-world data demonstrate clear advantages over meta-analysis and related approaches.
翻译:因果推断通常假设能够集中访问个体层面数据。然而在实践中,数据往往分散在不同中心,因隐私、后勤或法律限制而难以实现集中化。为解决这一问题,我们通过联邦学习方法从分散的观测数据中估计平均处理效应,允许通过交换聚合统计量而非个体层面数据进行推断。我们提出一种新方法,利用成员权重(定义为给定协变量条件下站点成员归属的概率)对局部倾向性评分进行联邦加权平均,从而估计倾向性评分。成员权重可通过标准联邦学习算法,使用参数化或非参数化分类模型灵活估计。由此得到的倾向性评分用于构建联邦逆概率加权(Fed-IPW)和增强型IPW(Fed-AIPW)估计量。与元分析方法(在任一站点违反积极性假设时即失效)不同,我们的方法利用不同站点间处理分配机制的异质性来改善重叠性。研究表明,Fed-IPW和Fed-AIPW在站点间样本量、处理机制和协变量分布存在异质性时表现良好。理论分析及在模拟数据和真实数据上的实验均证明了该方法相对于元分析及相关方法的显著优势。