Graph Neural Networks (GNNs) have proven to be quite versatile for a variety of applications, including recommendation systems, fake news detection, drug discovery, and even computer vision. Due to the expanding size of graph-structured data, GNN models have also increased in complexity, leading to substantial latency issues. This is primarily attributed to the irregular structure of graph data and its access pattern into memory. The natural solution to reduce latency is to compress large GNNs into small GNNs. One way to do this is via knowledge distillation (KD). However, most KD approaches for GNNs only consider the outputs of the last layers and do not consider the outputs of the intermediate layers of the GNNs; these layers may contain important inductive biases indicated by the graph structure. To address this shortcoming, we propose a novel KD approach to GNN compression that we call Attention-Based Knowledge Distillation (ABKD). ABKD is a KD approach that uses attention to identify important intermediate teacher-student layer pairs and focuses on aligning their outputs. ABKD enables higher compression of GNNs with a smaller accuracy dropoff compared to existing KD approaches. On average, we achieve a 1.79% increase in accuracy with a 32.3x compression ratio on OGBN-Mag, a large graph dataset, compared to state-of-the-art approaches.
翻译:图神经网络(GNN)已被证明在推荐系统、虚假新闻检测、药物发现乃至计算机视觉等众多应用中具有广泛适用性。由于图结构数据的规模持续增长,GNN模型的复杂度也随之增加,导致严重的延迟问题。这主要归因于图数据的不规则结构及其内存访问模式。降低延迟的自然解决方案是将大型GNN压缩为小型GNN,其中一种实现方式是通过知识蒸馏(KD)。然而,现有针对GNN的KD方法大多仅考虑最后一层的输出,而忽略了中间层的输出——这些层可能蕴含由图结构表征的重要归纳偏置。为解决这一缺陷,我们提出一种称为基于注意力机制知识蒸馏(ABKD)的新型GNN压缩KD方法。ABKD利用注意力机制识别重要的中间教师-学生层对,并重点对齐其输出。与现有KD方法相比,ABKD能够在保持较小精度损失的前提下实现更高程度的GNN压缩。在大型图数据集OGBN-Mag上,我们平均获得了1.79%的准确率提升,同时实现了32.3倍的压缩比,优于当前最先进的方法。