Understanding causal relations is vital in scientific discovery. The process of causal structure learning involves identifying causal graphs from observational data to understand such relations. Usually, a central server performs this task, but sharing data with the server poses privacy risks. Federated learning can solve this problem, but existing solutions for federated causal structure learning make unrealistic assumptions about data and lack convergence guarantees. FedC2SL is a federated constraint-based causal structure learning scheme that learns causal graphs using a federated conditional independence test, which examines conditional independence between two variables under a condition set without collecting raw data from clients. FedC2SL requires weaker and more realistic assumptions about data and offers stronger resistance to data variability among clients. FedPC and FedFCI are the two variants of FedC2SL for causal structure learning in causal sufficiency and causal insufficiency, respectively. The study evaluates FedC2SL using both synthetic datasets and real-world data against existing solutions and finds it demonstrates encouraging performance and strong resilience to data heterogeneity among clients.
翻译:理解因果关系对于科学发现至关重要。因果结构学习的过程涉及从观测数据中识别因果图,以理解此类关系。通常,中央服务器执行此任务,但与服务器共享数据会带来隐私风险。联邦学习可以解决此问题,但现有的联邦因果结构学习方案对数据做出了不切实际的假设,且缺乏收敛保证。FedC2SL 是一种基于联邦约束的因果结构学习方案,它利用联邦条件独立性检验来学习因果图,该检验在无需收集客户端原始数据的情况下,检查条件集下两个变量之间的条件独立性。FedC2SL 对数据的要求更弱且更符合实际,并具备更强的抵抗客户端数据异质性的能力。FedPC 和 FedFCI 是 FedC2SL 的两种变体,分别用于因果充分性和因果不足条件下的因果结构学习。该研究使用合成数据集和真实世界数据将 FedC2SL 与现有方案进行对比评估,发现其展现出令人鼓舞的性能以及对客户端数据异质性较强的鲁棒性。