The rapid development of Internet technology has given rise to a vast amount of graph-structured data. Graph Neural Networks (GNNs), as an effective method for various graph mining tasks, incurs substantial computational resource costs when dealing with large-scale graph data. A data-centric manner solution is proposed to condense the large graph dataset into a smaller one without sacrificing the predictive performance of GNNs. However, existing efforts condense graph-structured data through a computational intensive bi-level optimization architecture also suffer from massive computation costs. In this paper, we propose reforming the graph condensation problem as a Kernel Ridge Regression (KRR) task instead of iteratively training GNNs in the inner loop of bi-level optimization. More specifically, We propose a novel dataset condensation framework (GC-SNTK) for graph-structured data, where a Structure-based Neural Tangent Kernel (SNTK) is developed to capture the topology of graph and serves as the kernel function in KRR paradigm. Comprehensive experiments demonstrate the effectiveness of our proposed model in accelerating graph condensation while maintaining high prediction performance. The source code is available on https://github.com/WANGLin0126/GCSNTK.
翻译:随着互联网技术的快速发展,产生了大量图结构数据。图神经网络作为一种有效的图挖掘任务方法,在处理大规模图数据时会产生大量计算资源开销。为解决这一问题,研究者提出了一种以数据为中心的解决方案,将大型图数据集压缩为更小的数据集,同时保持图神经网络的预测性能。然而,现有方法通过计算密集的双层优化架构压缩图结构数据,同样面临巨大的计算成本。本文提出将图压缩问题重构为核岭回归任务,而非在双层优化的内部循环中迭代训练图神经网络。具体而言,我们为图结构数据设计了一种新型数据集压缩框架(GC-SNTK),其中开发了基于结构的神经切线核来捕获图拓扑结构,并作为核岭回归范式中的核函数。大量实验证明,该模型在保持高预测性能的同时,能有效加速图压缩过程。源代码已开源在https://github.com/WANGLin0126/GCSNTK。