Vertical federated learning (VFL) is a distributed learning paradigm, where computing clients collectively train a model based on the partial features of the same set of samples they possess. Current research on VFL focuses on the case when samples are independent, but it rarely addresses an emerging scenario when samples are interrelated through a graph. For graph-structured data, graph neural networks (GNNs) are competitive machine learning models, but a naive implementation in the VFL setting causes a significant communication overhead. Moreover, the analysis of the training is faced with a challenge caused by the biased stochastic gradients. In this paper, we propose a model splitting method that splits a backbone GNN across the clients and the server and a communication-efficient algorithm, GLASU, to train such a model. GLASU adopts lazy aggregation and stale updates to skip aggregation when evaluating the model and skip feature exchanges during training, greatly reducing communication. We offer a theoretical analysis and conduct extensive numerical experiments on real-world datasets, showing that the proposed algorithm effectively trains a GNN model, whose performance matches that of the backbone GNN when trained in a centralized manner.
翻译:垂直联邦学习(VFL)是一种分布式学习范式,其中计算客户端基于各自持有的同一组样本的部分特征协同训练模型。当前关于VFL的研究主要关注样本相互独立的情形,但很少涉及样本通过图结构相互关联的新兴场景。对于图结构数据,图神经网络(GNNs)是具有竞争力的机器学习模型,但在VFL设置中直接实现会导致显著的通信开销。此外,训练过程的分析因随机梯度存在偏差而面临挑战。本文提出一种模型拆分方法,将骨干GNN模型拆分至客户端与服务器,并设计通信高效算法GLASU用于训练该模型。GLASU采用懒惰聚合与过时更新策略,在模型评估时跳过聚合操作,在训练过程中跳过特征交换,大幅降低通信量。我们提供理论分析,并在真实数据集上进行大量数值实验,结果表明所提算法能有效训练GNN模型,其性能可达到与集中式训练骨干GNN相当的水平。