Graph condensation aims to reduce the size of a large-scale graph dataset by synthesizing a compact counterpart without sacrificing the performance of Graph Neural Networks (GNNs) trained on it, which has shed light on reducing the computational cost for training GNNs. Nevertheless, existing methods often fall short of accurately replicating the original graph for certain datasets, thereby failing to achieve the objective of lossless condensation. To understand this phenomenon, we investigate the potential reasons and reveal that the previous state-of-the-art trajectory matching method provides biased and restricted supervision signals from the original graph when optimizing the condensed one. This significantly limits both the scale and efficacy of the condensed graph. In this paper, we make the first attempt toward \textit{lossless graph condensation} by bridging the previously neglected supervision signals. Specifically, we employ a curriculum learning strategy to train expert trajectories with more diverse supervision signals from the original graph, and then effectively transfer the information into the condensed graph with expanding window matching. Moreover, we design a loss function to further extract knowledge from the expert trajectories. Theoretical analysis justifies the design of our method and extensive experiments verify its superiority across different datasets. Code is released at https://github.com/NUS-HPC-AI-Lab/GEOM.
翻译:图压缩旨在通过生成紧凑的图数据集来缩减大规模图数据集的规模,同时不损失在其上训练的图神经网络(GNNs)的性能,这为降低GNNs训练的计算成本提供了思路。然而,现有方法在精确复现某些数据集的原始图方面往往存在不足,从而无法实现无损压缩的目标。为理解这一现象,我们研究了潜在原因,并揭示了先前最先进的轨迹匹配方法在优化压缩图时,从原始图中提供了有偏且受限的监督信号,这显著限制了压缩图的规模与有效性。本文首次尝试通过桥接先前被忽略的监督信号来实现*无损图压缩*。具体而言,我们采用课程学习策略,从原始图中训练具有更多样监督信号的专家轨迹,然后通过扩展窗口匹配将信息有效迁移至压缩图。此外,我们设计了一个损失函数以从专家轨迹中进一步提取知识。理论分析验证了我们方法的设计合理性,大量实验证明了其在多个数据集上的优越性。代码已发布在 https://github.com/NUS-HPC-AI-Lab/GEOM。