Methods for training models on graphs distributed across multiple clients have recently grown in popularity, due to the size of these graphs as well as regulations on keeping data where it is generated. However, a single connected graph cannot be disjointly partitioned onto multiple clients due to the cross-client edges connecting graph nodes. Thus, distributed methods for training a model on a single graph incur either significant communication overhead between clients or a loss of available information to the training. We introduce the Federated Graph Convolutional Network (FedGCN) algorithm, which uses federated learning to train GCN models for semi-supervised node classification with fast convergence and little communication. Compared to prior methods that require communication among clients at each training round, FedGCN clients only communicate with the central server in one pre-training step, greatly reducing communication costs and allowing the use of homomorphic encryption to further enhance privacy. We theoretically analyze the tradeoff between FedGCN's convergence rate and communication cost under different data distributions. Experimental results show that our FedGCN algorithm achieves better model accuracy with 51.7% faster convergence on average and at least 100X less communication compared to prior work.
翻译:近期,由于图数据的规模以及数据本地化存储法规的要求,在分布于多个客户端的图数据上训练模型的方法日益流行。然而,由于存在连接不同客户端图节点的跨客户端边,单个连通图无法被分割成不相交的子图分配给多个客户端。因此,在单个图上训练模型的分布式方法要么会产生客户端间显著的通信开销,要么会损失训练可用的信息。我们提出了联邦图卷积网络(FedGCN)算法,该算法利用联邦学习训练GCN模型以进行半监督节点分类,具有快速收敛和低通信开销的特点。与需要在每轮训练中进行客户端间通信的先前方法不同,FedGCN客户端仅在预训练步骤中与中央服务器通信一次,这极大地降低了通信成本,并允许使用同态加密来进一步增强隐私性。我们从理论上分析了在不同数据分布下FedGCN收敛速率与通信开销之间的权衡。实验结果表明,与先前工作相比,我们的FedGCN算法在实现更高模型精度的同时,平均收敛速度提升51.7%,且通信量至少减少100倍。