On Representation Knowledge Distillation for Graph Neural Networks

from arxiv, IEEE Transactions on Neural Networks and Learning Representation (TNNLS), Special Issue on Deep Neural Networks for Graphs: Theory, Models, Algorithms and Applications

Knowledge distillation is a learning paradigm for boosting resource-efficient graph neural networks (GNNs) using more expressive yet cumbersome teacher models. Past work on distillation for GNNs proposed the Local Structure Preserving loss (LSP), which matches local structural relationships defined over edges across the student and teacher's node embeddings. This paper studies whether preserving the global topology of how the teacher embeds graph data can be a more effective distillation objective for GNNs, as real-world graphs often contain latent interactions and noisy edges. We propose Graph Contrastive Representation Distillation (G-CRD), which uses contrastive learning to implicitly preserve global topology by aligning the student node embeddings to those of the teacher in a shared representation space. Additionally, we introduce an expanded set of benchmarks on large-scale real-world datasets where the performance gap between teacher and student GNNs is non-negligible. Experiments across 4 datasets and 14 heterogeneous GNN architectures show that G-CRD consistently boosts the performance and robustness of lightweight GNNs, outperforming LSP (and a global structure preserving variant of LSP) as well as baselines from 2D computer vision. An analysis of the representational similarity among teacher and student embedding spaces reveals that G-CRD balances preserving local and global relationships, while structure preserving approaches are best at preserving one or the other. Our code is available at https://github.com/chaitjo/efficient-gnns

翻译：知识蒸馏是一种学习范式，旨在利用更具表达能力但计算成本高昂的教师模型来提升资源高效的图神经网络（GNN）性能。以往针对GNN的蒸馏工作提出了局部结构保持损失（LSP），该损失通过匹配学生与教师节点嵌入中基于边定义的局部结构关系。本文研究保留教师嵌入图数据的全局拓扑结构是否可能成为GNN更有效的蒸馏目标，因为真实世界图常包含潜在交互与噪声边。我们提出图对比表示蒸馏（G-CRD），该方法利用对比学习通过将学生节点嵌入对齐到共享表示空间中的教师节点嵌入，隐式保留全局拓扑。此外，我们在大规模真实数据集上引入扩展基准集，其中教师与学生GNN之间的性能差距不可忽视。在4个数据集和14种异构GNN架构上的实验表明，G-CRD持续提升轻量级GNN的性能与鲁棒性，优于LSP（及其全局结构保持变体）以及来自二维计算机视觉的基线方法。对教师与学生嵌入空间表示相似性的分析揭示，G-CRD平衡了局部与全局关系的保持，而结构保持方法仅能最优地保持其中一种关系。我们的代码开源在https://github.com/chaitjo/efficient-gnns。