Methods for training models on graphs distributed across multiple clients have recently grown in popularity, due to the size of these graphs as well as regulations on keeping data where it is generated. However, the cross-client edges naturally exist among clients. Thus, distributed methods for training a model on a single graph incur either significant communication overhead between clients or a loss of available information to the training. We introduce the Federated Graph Convolutional Network (FedGCN) algorithm, which uses federated learning to train GCN models for semi-supervised node classification with fast convergence and little communication. Compared to prior methods that require extra communication among clients at each training round, FedGCN clients only communicate with the central server in one pre-training step, greatly reducing communication costs and allowing the use of homomorphic encryption to further enhance privacy. We theoretically analyze the tradeoff between FedGCN's convergence rate and communication cost under different data distributions. Experimental results show that our FedGCN algorithm achieves better model accuracy with 51.7% faster convergence on average and at least 100X less communication compared to prior work.
翻译:近年来,由于图数据的规模以及数据需保留在生成地的监管要求,跨多个客户端分布式训练图模型的方法日益流行。然而,客户端之间天然存在跨客户端边,因此,在单一图上训练模型的分布式方法要么导致客户端间显著的通信开销,要么导致训练中可用信息的损失。我们提出联邦图卷积网络(FedGCN)算法,利用联邦学习训练GCN模型进行半监督节点分类,实现快速收敛且通信量极小。与需要每轮训练在客户端间进行额外通信的现有方法相比,FedGCN客户端仅在单次预训练步骤中与中央服务器通信,大幅降低通信成本,并允许使用同态加密进一步增强隐私。我们从理论上分析了不同数据分布下FedGCN收敛速率与通信成本之间的权衡。实验结果表明,与现有工作相比,我们的FedGCN算法在模型精度上更优,平均收敛速度提升51.7%,通信量降低至少100倍。