Decentralized data sources are prevalent in real-world applications, posing a formidable challenge for causal inference. These sources cannot be consolidated into a single entity owing to privacy constraints. The presence of dissimilar data distributions and missing values within them can potentially introduce bias to the causal estimands. In this article, we propose a framework to estimate causal effects from decentralized data sources. The proposed framework avoid exchanging raw data among the sources, thus contributing towards privacy-preserving causal learning. Three instances of the proposed framework are introduced to estimate causal effects across a wide range of diverse scenarios within a federated setting. (1) FedCI: a Bayesian framework based on Gaussian processes for estimating causal effects from federated observational data sources. It estimates the posterior distributions of the causal effects to compute the higher-order statistics that capture the uncertainty. (2) CausalRFF: an adaptive transfer algorithm that learns the similarities among the data sources by utilizing Random Fourier Features to disentangle the loss function into multiple components, each of which is associated with a data source. It estimates the similarities among the sources through transfer coefficients, and hence requiring no prior information about the similarity measures. (3) CausalFI: a new approach for federated causal inference from incomplete data, enabling the estimation of causal effects from multiple decentralized and incomplete data sources. It accounts for the missing data under the missing at random assumption, while also estimating higher-order statistics of the causal estimands. The proposed federated framework and its instances are an important step towards a privacy-preserving causal learning model.
翻译:现实应用中普遍存在去中心化数据源,这给因果推断带来了巨大挑战。由于隐私限制,这些数据源无法整合至单一实体。其内部存在的数据分布差异与缺失值可能对因果估计量引入偏差。本文提出一种从去中心化数据源估计因果效应的框架。该框架避免在数据源间交换原始数据,从而有助于实现隐私保护的因果学习。本文引入所提框架的三种具体实现,以适应联邦学习环境下各类多样化场景的因果效应估计:(1)FedCI:基于高斯过程的贝叶斯框架,用于从联邦观测数据源估计因果效应。该框架通过估计因果效应的后验分布来计算捕捉不确定性的高阶统计量。(2)CausalRFF:一种自适应迁移算法,通过利用随机傅里叶特征将损失函数解耦为多个分量(每个分量对应一个数据源)来学习数据源间的相似性。该算法通过迁移系数估计源间相似性,无需预先获取相似性度量信息。(3)CausalFI:一种处理不完整数据的联邦因果推断新方法,能够从多个去中心化且不完整的数据源估计因果效应。该方法在随机缺失假设下处理缺失数据,同时估计因果估计量的高阶统计量。所提出的联邦框架及其具体实现是迈向隐私保护因果学习模型的重要进展。