The prevalence of large-scale graphs poses great challenges in time and storage for training and deploying graph neural networks (GNNs). Several recent works have explored solutions for pruning the large original graph into a small and highly-informative one, such that training and inference on the pruned and large graphs have comparable performance. Although empirically effective, current researches focus on static or non-temporal graphs, which are not directly applicable to dynamic scenarios. In addition, they require labels as ground truth to learn the informative structure, limiting their applicability to new problem domains where labels are hard to obtain. To solve the dilemma, we propose and study the problem of unsupervised graph pruning on dynamic graphs. We approach the problem by our proposed STEP, a self-supervised temporal pruning framework that learns to remove potentially redundant edges from input dynamic graphs. From a technical and industrial viewpoint, our method overcomes the trade-offs between the performance and the time & memory overheads. Our results on three real-world datasets demonstrate the advantages on improving the efficacy, robustness, and efficiency of GNNs on dynamic node classification tasks. Most notably, STEP is able to prune more than 50% of edges on a million-scale industrial graph Alipay (7M nodes, 21M edges) while approximating up to 98% of the original performance. Code is available at https://github.com/EdisonLeeeee/STEP.
翻译:大规模图的普遍性对图神经网络(GNNs)的训练与部署在时间和存储上带来了巨大挑战。近期研究探索了将原始大图剪枝为富含信息的小图的方案,使得在剪枝图与大图上训练和推理的性能相当。尽管实验有效,但现有研究主要聚焦于静态或非时序图,无法直接适用于动态场景。此外,这些方法需要标签作为真值来学习信息性结构,限制了其在标签难以获取的新问题领域中的应用。为解决这一困境,我们提出并研究了动态图上的无监督图剪枝问题。我们通过提出的STEP框架——一种自监督时序剪枝框架——来学习移除输入动态图中的潜在冗余边。从技术与工业视角看,我们的方法克服了性能与时间/内存开销之间的权衡。在三个真实数据集上的实验结果表明,我们的方法在提升GNNs动态节点分类任务的有效性、鲁棒性与效率方面具有优势。值得注意的是,STEP能够在百万级工业图支付宝(700万节点、2100万边)上剪枝超过50%的边,同时保持高达98%的原始性能。代码已开源至 https://github.com/EdisonLeeeee/STEP。