Decentralized and incomplete data sources are prevalent in real-world applications, posing a formidable challenge for causal inference. These sources cannot be consolidated into a single entity owing to privacy constraints, and the presence of missing values within them can potentially introduce bias to the causal estimands. We introduce a new approach for federated causal inference from incomplete data, enabling the estimation of causal effects from multiple decentralized and incomplete data sources. Our approach disentangles the loss function into multiple components, each corresponding to a specific data source with missing values. Our approach accounts for the missing data under the missing at random assumption, while also estimating higher-order statistics of the causal estimands. Our method recovers the conditional distribution of missing confounders given the observed confounders from the decentralized data sources to identify causal effects. Our framework estimates heterogeneous causal effects without the sharing of raw training data among sources, which helps to mitigate privacy risks. The efficacy of our approach is demonstrated through a collection of simulated and real-world instances, illustrating its potential and practicality.
翻译:现实应用中的去中心化且不完整数据源普遍存在,这对因果推断构成了严峻挑战。由于隐私约束,这些数据源无法整合为单一实体,且其中缺失值的存在可能为因果估计量引入偏差。我们提出了一种从缺失数据中进行联邦因果推断的新方法,能够利用多个去中心化且不完整的数据源估计因果效应。该方法将损失函数分解为多个分量,每个分量对应一个包含缺失值的特定数据源。在随机缺失假设下,我们的方法考虑了缺失数据问题,同时估计了因果估计量的高阶统计量。通过从去中心化数据源中恢复给定观测混杂因素时缺失混杂因素的条件分布,我们的方法能够识别因果效应。该框架在不共享各源原始训练数据的前提下估计异质性因果效应,有助于缓解隐私风险。通过一系列模拟与真实数据实例验证了该方法的有效性,展示了其潜在应用价值与实用性。