Understanding causal relations is vital in scientific discovery. The process of causal structure learning involves identifying causal graphs from observational data to understand such relations. Usually, a central server performs this task, but sharing data with the server poses privacy risks. Federated learning can solve this problem, but existing solutions for federated causal structure learning make unrealistic assumptions about data and lack convergence guarantees. FedC2SL is a federated constraint-based causal structure learning scheme that learns causal graphs using a federated conditional independence test, which examines conditional independence between two variables under a condition set without collecting raw data from clients. FedC2SL requires weaker and more realistic assumptions about data and offers stronger resistance to data variability among clients. FedPC and FedFCI are the two variants of FedC2SL for causal structure learning in causal sufficiency and causal insufficiency, respectively. The study evaluates FedC2SL using both synthetic datasets and real-world data against existing solutions and finds it demonstrates encouraging performance and strong resilience to data heterogeneity among clients.
翻译:理解因果关系在科学发现中至关重要。因果结构学习的过程涉及从观测数据中识别因果图以理解此类关系。通常,中心服务器执行此任务,但将数据共享给服务器会带来隐私风险。联邦学习可解决此问题,但现有的联邦因果结构学习方案对数据做出了不切实际的假设,且缺乏收敛性保证。FedC2SL是一种基于约束的联邦因果结构学习方案,它通过联邦条件独立性检验来学习因果图,该检验在无需收集客户端原始数据的情况下,检查两个变量在给定条件集下的条件独立性。FedC2SL对数据提出了更弱且更切合实际的假设,并展现出对客户端间数据变异性的更强鲁棒性。FedPC和FedFCI是FedC2SL的两种变体,分别用于因果充分性和因果不足性下的因果结构学习。本研究利用合成数据集和真实世界数据,将FedC2SL与现有方案进行了评估,发现其表现出令人鼓舞的性能以及应对客户端间数据异质性的强大弹性。