Behemoth graphs are often fragmented and separately stored by multiple data owners as distributed subgraphs in many realistic applications. Without harming data privacy, it is natural to consider the subgraph federated learning (subgraph FL) scenario, where each local client holds a subgraph of the entire global graph, to obtain globally generalized graph mining models. To overcome the unique challenge of incomplete information propagation on local subgraphs due to missing cross-subgraph neighbors, previous works resort to the augmentation of local neighborhoods through the joint FL of missing neighbor generators and GNNs. Yet their technical designs have profound limitations regarding the utility, efficiency, and privacy goals of FL. In this work, we propose FedDEP to comprehensively tackle these challenges in subgraph FL. FedDEP consists of a series of novel technical designs: (1) Deep neighbor generation through leveraging the GNN embeddings of potential missing neighbors; (2) Efficient pseudo-FL for neighbor generation through embedding prototyping; and (3) Privacy protection through noise-less edge-local-differential-privacy. We analyze the correctness and efficiency of FedDEP, and provide theoretical guarantees on its privacy. Empirical results on four real-world datasets justify the clear benefits of proposed techniques.
翻译:庞大数据图在许多实际应用中常被分割并分别存储于多个数据拥有者处,形成分布式子图。在不损害数据隐私的前提下,考虑子图联邦学习(子图FL)场景(其中每个本地客户端持有全局图的子图)以获得全局泛化的图挖掘模型是一种自然选择。为克服因缺失跨子图邻居导致的本地子图信息传播不完整这一独特挑战,先前研究通过联合训练缺失邻居生成器与GNN来增强本地邻域。然而,其技术设计在联邦学习的效用、效率及隐私目标方面存在显著局限性。本文提出FedDEP以全面解决子图联邦学习中的这些挑战。FedDEP包含一系列创新技术设计:(1)通过利用潜在缺失邻居的GNN嵌入进行深度邻居生成;(2)通过嵌入原型化实现邻居生成的高效伪联邦学习;(3)通过无噪声边级本地差分隐私实现隐私保护。我们分析了FedDEP的正确性与效率,并对其隐私保护提供了理论保证。在四个真实数据集上的实证结果充分证明了所提技术的优势。