The rapid development of Internet technology has given rise to a vast amount of graph-structured data. Graph Neural Networks (GNNs), as an effective method for various graph mining tasks, incurs substantial computational resource costs when dealing with large-scale graph data. A data-centric manner solution is proposed to condense the large graph dataset into a smaller one without sacrificing the predictive performance of GNNs. However, existing efforts condense graph-structured data through a computational intensive bi-level optimization architecture also suffer from massive computation costs. In this paper, we propose reforming the graph condensation problem as a Kernel Ridge Regression (KRR) task instead of iteratively training GNNs in the inner loop of bi-level optimization. More specifically, We propose a novel dataset condensation framework (GC-SNTK) for graph-structured data, where a Structure-based Neural Tangent Kernel (SNTK) is developed to capture the topology of graph and serves as the kernel function in KRR paradigm. Comprehensive experiments demonstrate the effectiveness of our proposed model in accelerating graph condensation while maintaining high prediction performance.
翻译:互联网技术的迅猛发展催生了海量的图结构数据。图神经网络(GNNs)作为各类图挖掘任务的有效方法,在处理大规模图数据时会产生巨大的计算资源消耗。一种以数据为中心的解决方案被提出,旨在将大型图数据集压缩为更小的数据集,同时不牺牲GNN的预测性能。然而,现有通过计算密集型双层优化架构进行图结构化数据压缩的研究,同样面临巨大的计算开销。本文提出将图压缩问题重构为核岭回归(KRR)任务,而非在双层优化的内循环中迭代训练GNN。具体而言,我们提出一种面向图结构数据的新型数据集压缩框架(GC-SNTK),其中开发了基于结构的神经切线核(SNTK)以捕获图拓扑信息,并将其作为KRR范式中的核函数。综合实验表明,我们提出的模型在保持高预测性能的同时,能有效加速图压缩过程。