As the field of Graph Neural Networks (GNN) continues to grow, it experiences a corresponding increase in the need for large, real-world datasets to train and test new GNN models on challenging, realistic problems. Unfortunately, such graph datasets are often generated from online, highly privacy-restricted ecosystems, which makes research and development on these datasets hard, if not impossible. This greatly reduces the amount of benchmark graphs available to researchers, causing the field to rely only on a handful of publicly-available datasets. To address this problem, we introduce a novel graph generative model, Computation Graph Transformer (CGT) that learns and reproduces the distribution of real-world graphs in a privacy-controlled way. More specifically, CGT (1) generates effective benchmark graphs on which GNNs show similar task performance as on the source graphs, (2) scales to process large-scale graphs, (3) incorporates off-the-shelf privacy modules to guarantee end-user privacy of the generated graph. Extensive experiments across a vast body of graph generative models show that only our model can successfully generate privacy-controlled, synthetic substitutes of large-scale real-world graphs that can be effectively used to benchmark GNN models.
翻译:随着图神经网络(GNN)领域的持续发展,对用于在具有挑战性的实际问题中训练和测试新GNN模型的大规模真实世界数据集的需求也相应增长。然而,这类图数据集通常源自在线且高度隐私受限的生态系统,这使得基于这些数据集的研究与开发变得困难甚至不可能。这极大地减少了研究人员可用的基准图数量,导致该领域只能依赖少数公开数据集。为解决这一问题,我们提出了一种新颖的图生成模型——计算图变换器(Computation Graph Transformer, CGT),该模型能够以隐私可控的方式学习并再现真实世界图的分布。具体而言,CGT:(1) 生成有效的基准图,使GNN在其上的任务表现与在原始图上相似;(2) 可扩展以处理大规模图;(3) 集成现成的隐私模块,以保证所生成图的最终用户隐私。在大量图生成模型上的广泛实验表明,唯有我们的模型能够成功生成大规模真实世界图的隐私可控合成替身,并有效用于GNN模型的基准测试。