Graph Neural Networks (GNNs) unlock new ways of learning from graph-structured data, proving highly effective in capturing complex relationships and patterns. Federated GNNs (FGNNs) have emerged as a prominent distributed learning paradigm for training GNNs over decentralized data. However, FGNNs face two significant challenges: high communication overhead from multiple rounds of parameter exchanges and non-IID data characteristics across clients. To address these issues, we introduce CeFGC, a novel FGNN paradigm that facilitates efficient GNN training over non-IID data by limiting communication between the server and clients to three rounds only. The core idea of CeFGC is to leverage generative diffusion models to minimize direct client-server communication. Each client trains a generative diffusion model that captures its local graph distribution and shares this model with the server, which then redistributes it back to all clients. Using these generative models, clients generate synthetic graphs combined with their local graphs to train local GNN models. Finally, clients upload their model weights to the server for aggregation into a global GNN model. We theoretically analyze the I/O complexity of communication volume to show that CeFGC reduces to a constant of three communication rounds only. Extensive experiments on several real graph datasets demonstrate the effectiveness and efficiency of CeFGC against state-of-the-art competitors, reflecting our superior performance on non-IID graphs by aligning local and global model objectives and enriching the training set with diverse graphs.
翻译:图神经网络(GNNs)为从图结构数据中学习开辟了新途径,在捕获复杂关系和模式方面表现出高效能。联邦图神经网络(FGNNs)已成为一种重要的分布式学习范式,用于在去中心化数据上训练GNN。然而,FGNN面临两大挑战:多轮参数交换带来的高通信开销,以及客户端间非独立同分布的数据特性。为解决这些问题,我们提出了CeFGC,一种新颖的FGNN范式,通过将服务器与客户端之间的通信限制为仅三轮,实现在非独立同分布数据上的高效GNN训练。CeFGC的核心思想是利用生成扩散模型来最小化客户端与服务器之间的直接通信。每个客户端训练一个捕获其局部图分布的生成扩散模型,并将该模型共享给服务器,服务器随后将其重新分发给所有客户端。利用这些生成模型,客户端生成合成图并结合其局部图来训练本地GNN模型。最后,客户端将其模型权重上传至服务器,聚合成全局GNN模型。我们从理论上分析了通信量的I/O复杂度,证明CeFGC将通信轮次减少至仅三轮常数。在多个真实图数据集上的大量实验表明,CeFGC相较于现有先进方法具有显著的有效性和效率,通过对齐局部与全局模型目标并以多样化图丰富训练集,在非独立同分布图上展现出优越性能。