Simple Graph Condensation

The burdensome training costs on large-scale graphs have aroused significant interest in graph condensation, which involves tuning Graph Neural Networks (GNNs) on a small condensed graph for use on the large-scale original graph. Existing methods primarily focus on aligning key metrics between the condensed and original graphs, such as gradients, distribution and trajectory of GNNs, yielding satisfactory performance on downstream tasks. However, these complex metrics necessitate intricate computations and can potentially disrupt the optimization process of the condensation graph, making the condensation process highly demanding and unstable. Motivated by the recent success of simplified models in various fields, we propose a simplified approach to metric alignment in graph condensation, aiming to reduce unnecessary complexity inherited from GNNs. In our approach, we eliminate external parameters and exclusively retain the target condensed graph during the condensation process. Following the hierarchical aggregation principles of GNNs, we introduce the Simple Graph Condensation (SimGC) framework, which aligns the condensed graph with the original graph from the input layer to the prediction layer, guided by a pre-trained Simple Graph Convolution (SGC) model on the original graph. As a result, both graphs possess the similar capability to train GNNs. This straightforward yet effective strategy achieves a significant speedup of up to 10 times compared to existing graph condensation methods while performing on par with state-of-the-art baselines. Comprehensive experiments conducted on seven benchmark datasets demonstrate the effectiveness of SimGC in prediction accuracy, condensation time, and generalization capability. Our code will be made publicly available.

翻译：大规模图上的繁重训练成本引发了人们对图压缩的广泛兴趣，该技术通过在小型压缩图上训练图神经网络（GNN），使其能够应用于大规模原始图。现有方法主要聚焦于对齐压缩图与原始图之间的关键指标（如梯度、分布和GNN轨迹），从而在下游任务中取得令人满意的性能。然而，这些复杂指标需要繁琐的计算，并可能干扰压缩图的优化过程，导致压缩过程要求苛刻且不稳定。受近期简化模型在多个领域成功应用的启发，我们提出了一种简化的图压缩指标对齐方法，旨在消除GNN中继承的不必要复杂性。在我们的方法中，我们移除外部参数，在压缩过程中仅保留目标压缩图。遵循GNN的分层聚合原则，我们引入了简单图压缩（SimGC）框架，该框架在原始图上预训练的简单图卷积（SGC）模型引导下，从输入层到预测层逐层对齐压缩图与原始图。由此，两个图具备相似的GNN训练能力。这种简洁而有效的策略相较于现有图压缩方法实现了高达10倍的速度提升，同时性能与最先进基线持平。在七个基准数据集上的全面实验证明了SimGC在预测精度、压缩时间和泛化能力方面的有效性。我们的代码将公开提供。