The development of technologies for causal inference with the privacy preservation of distributed data has attracted considerable attention in recent years. To address this issue, we propose a data collaboration quasi-experiment (DC-QE) that enables causal inference from distributed data with privacy preservation. In our method, first, local parties construct dimensionality-reduced intermediate representations from the private data. Second, they share intermediate representations, instead of private data for privacy preservation. Third, propensity scores were estimated from the shared intermediate representations. Finally, the treatment effects were estimated from propensity scores. Our method can reduce both random errors and biases, whereas existing methods can only reduce random errors in the estimation of treatment effects. Through numerical experiments on both artificial and real-world data, we confirmed that our method can lead to better estimation results than individual analyses. Dimensionality-reduction loses some of the information in the private data and causes performance degradation. However, we observed that in the experiments, sharing intermediate representations with many parties to resolve the lack of subjects and covariates, our method improved performance enough to overcome the degradation caused by dimensionality-reduction. With the spread of our method, intermediate representations can be published as open data to help researchers find causalities and accumulated as a knowledge base.
翻译:近年来,在保护分布式数据隐私的前提下进行因果推断的技术发展引起了广泛关注。针对这一问题,我们提出了一种数据协同准实验(DC-QE)方法,能够在保护隐私的同时从分布式数据中实现因果推断。该方法首先由各局部方从私有数据中构建降维的中间表示,然后共享这些中间表示(而非私有数据)以保护隐私,接着基于共享的中间表示估计倾向得分,最后通过倾向得分估计处理效应。与现有方法仅能减少处理效应估计中的随机误差不同,我们的方法同时能够减少随机误差和偏差。通过在人工数据和真实数据上的数值实验,我们证实该方法相比独立分析能获得更优的估计结果。降维会损失部分私有数据信息并导致性能下降,但实验发现,通过向多方共享中间表示以弥补样本和协变量不足,我们的方法能够充分提升性能以克服降维带来的退化。随着该方法的应用推广,中间表示可作为开放数据发布,帮助研究人员发现因果关系,并积累为知识库。