GNNs, like other deep learning models, are data and computation hungry. There is a pressing need to scale training of GNNs on large datasets to enable their usage on low-resource environments. Graph distillation is an effort in that direction with the aim to construct a smaller synthetic training set from the original training data without significantly compromising model performance. While initial efforts are promising, this work is motivated by two key observations: (1) Existing graph distillation algorithms themselves rely on training with the full dataset, which undermines the very premise of graph distillation. (2) The distillation process is specific to the target GNN architecture and hyper-parameters and thus not robust to changes in the modeling pipeline. We circumvent these limitations by designing a distillation algorithm called Mirage for graph classification. Mirage is built on the insight that a message-passing GNN decomposes the input graph into a multiset of computation trees. Furthermore, the frequency distribution of computation trees is often skewed in nature, enabling us to condense this data into a concise distilled summary. By compressing the computation data itself, as opposed to emulating gradient flows on the original training set-a prevalent approach to date-Mirage transforms into an unsupervised and architecture-agnostic distillation algorithm. Extensive benchmarking on real-world datasets underscores Mirage's superiority, showcasing enhanced generalization accuracy, data compression, and distillation efficiency when compared to state-of-the-art baselines.
翻译:图神经网络(GNNs)与其他深度学习模型一样,对数据和计算资源有高度需求。当前迫切需要在大规模数据集上扩展GNN训练,使其能够在低资源环境中应用。图蒸馏旨在从原始训练数据中构建更小规模的合成训练集,同时不显著降低模型性能。尽管初步研究已取得进展,但本工作基于两个关键观察展开:(1)现有图蒸馏算法本身依赖完整数据集进行训练,这与图蒸馏的基本前提相悖;(2)蒸馏过程针对特定目标GNN架构和超参数设计,对建模流程的变更缺乏鲁棒性。我们通过设计面向图分类的蒸馏算法Mirage来规避这些限制。Mirage基于以下洞见构建:消息传递GNN将输入图分解为计算树的多重集。此外,计算树的频率分布通常具有偏斜特性,使得我们能将数据压缩为精简的蒸馏摘要。与当前主流的模拟原始训练集梯度流方法不同,Mirage通过直接压缩计算数据本身,转化为一种无监督且架构无关的蒸馏算法。在真实世界数据集上的广泛基准测试表明,与最新基线方法相比,Mirage在泛化准确率、数据压缩率和蒸馏效率方面均展现出显著优势。