Graph condensation, which reduces the size of a large-scale graph by synthesizing a small-scale condensed graph as its substitution, has immediately benefited various graph learning tasks. However, existing graph condensation methods rely on centralized data storage, which is unfeasible for real-world decentralized data distribution, and overlook data holders' privacy-preserving requirements. To bridge the gap, we propose and study the novel problem of federated graph condensation for graph neural networks (GNNs). Specifically, we first propose a general framework for federated graph condensation, in which we decouple the typical gradient matching process for graph condensation into client-side gradient calculation and server-side gradient matching. In this way, the burdensome computation cost in client-side is largely alleviated. Besides, our empirical studies show that under the federated setting, the condensed graph will consistently leak data membership privacy, i.e., the condensed graph during the federated training can be utilized to steal the training data under the membership inference attacks (MIA). To tackle this issue, we innovatively incorporate information bottleneck principles into the federated graph condensation, which only needs to extract partial node features in one local pre-training step and utilize the features during federated training. Extensive experiments on real-world datasets demonstrate that our framework can consistently protect membership privacy during training. Meanwhile, it also achieves comparable and even superior performance against existing centralized graph condensation and federated graph learning methods.
翻译:图压缩通过合成小规模压缩图作为替代来减小大规模图的规模,已迅速惠及多种图学习任务。然而,现有图压缩方法依赖于集中式数据存储,这无法适应现实世界去中心化的数据分布,且忽视了数据持有者的隐私保护需求。为弥补这一差距,我们提出并研究了面向图神经网络(GNNs)的联邦图压缩这一新问题。具体而言,我们首先提出了联邦图压缩的通用框架,将典型的图压缩梯度匹配过程解耦为客户端梯度计算与服务器端梯度匹配。通过这种方式,客户端沉重的计算负担得以大幅减轻。此外,我们的实证研究表明,在联邦设置下,压缩图将持续泄露数据成员隐私,即联邦训练过程中的压缩图可被用于通过成员推理攻击(MIA)窃取训练数据。为解决此问题,我们创新性地将信息瓶颈原则引入联邦图压缩,该方法仅需在本地预训练步骤中提取部分节点特征,并在联邦训练期间利用这些特征。在真实数据集上的大量实验表明,我们的框架能够在训练过程中持续保护成员隐私。同时,其性能与现有集中式图压缩及联邦图学习方法相当甚至更优。