Recently, Federated Graph Learning (FGL) has attracted significant attention as a distributed framework based on graph neural networks, primarily due to its capability to break data silos. Existing FGL studies employ community split on the homophilous global graph by default to simulate federated semi-supervised node classification settings. Such a strategy assumes the consistency of topology between the multi-client subgraphs and the global graph, where connected nodes are highly likely to possess similar feature distributions and the same label. However, in real-world implementations, the varying perspectives of local data engineering result in various subgraph topologies, posing unique heterogeneity challenges in FGL. Unlike the well-known label Non-independent identical distribution (Non-iid) problems in federated learning, FGL heterogeneity essentially reveals the topological divergence among multiple clients, namely homophily or heterophily. To simulate and handle this unique challenge, we introduce the concept of structure Non-iid split and then present a new paradigm called \underline{Ada}ptive \underline{F}ederated \underline{G}raph \underline{L}earning (AdaFGL), a decoupled two-step personalized approach. To begin with, AdaFGL employs standard multi-client federated collaborative training to acquire the federated knowledge extractor by aggregating uploaded models in the final round at the server. Then, each client conducts personalized training based on the local subgraph and the federated knowledge extractor. Extensive experiments on the 12 graph benchmark datasets validate the superior performance of AdaFGL over state-of-the-art baselines. Specifically, in terms of test accuracy, our proposed AdaFGL outperforms baselines by significant margins of 3.24\% and 5.57\% on community split and structure Non-iid split, respectively.
翻译:近期,基于图神经网络的分布式框架——联邦图学习(FGL)因打破数据孤岛的能力而备受关注。现有FGL研究默认采用同质全局图上的社区划分来模拟联邦半监督节点分类场景。该策略假设多客户端子图与全局图的拓扑一致性,即相连节点具有相似的标签与特征分布。然而在实际应用中,局部数据工程的不同视角导致子图拓扑各异,这给FGL带来了独特的异构性挑战。与联邦学习中常见的标签非独立同分布(Non-iid)问题不同,FGL的异构性实质上是多客户端间的拓扑发散,具体表现为同质性或异质性。为模拟并应对这一独特挑战,我们引入结构非独立同分布(结构Non-iid)划分概念,提出一种名为自适应联邦图学习(AdaFGL)的新范式——一种解耦的两步个性化方法。首先,AdaFGL采用标准多客户端联邦协作训练,通过服务器最终轮聚合上传模型来获取联邦知识提取器。随后,各客户端基于本地子图与联邦知识提取器进行个性化训练。在12个图基准数据集上的广泛实验验证了AdaFGL相较于现有最优基线的卓越性能。具体而言,在测试准确率方面,AdaFGL在社区划分与结构Non-iid划分上分别以3.24%和5.57%的显著优势超越基线方法。