Knowledge distillation (KD) has shown to be effective to boost the performance of graph neural networks (GNNs), where the typical objective is to distill knowledge from a deeper teacher GNN into a shallower student GNN. However, it is often quite challenging to train a satisfactory deeper GNN due to the well-known over-parametrized and over-smoothing issues, leading to invalid knowledge transfer in practical applications. In this paper, we propose the first Free-direction Knowledge Distillation framework via reinforcement learning for GNNs, called FreeKD, which is no longer required to provide a deeper well-optimized teacher GNN. Our core idea is to collaboratively learn two shallower GNNs to exchange knowledge between them. As we observe that one typical GNN model often exhibits better and worse performances at different nodes during training, we devise a dynamic and free-direction knowledge transfer strategy that involves two levels of actions: 1) node-level action determines the directions of knowledge transfer between the corresponding nodes of two networks; and then 2) structure-level action determines which of the local structures generated by the node-level actions to be propagated. Additionally, considering that different augmented graphs can potentially capture distinct perspectives of the graph data, we propose FreeKD-Prompt that learns undistorted and diverse augmentations based on prompt learning for exchanging varied knowledge. Furthermore, instead of confining knowledge exchange within two GNNs, we develop FreeKD++ to enable free-direction knowledge transfer among multiple GNNs. Extensive experiments on five benchmark datasets demonstrate our approaches outperform the base GNNs in a large margin. More surprisingly, our FreeKD has comparable or even better performance than traditional KD algorithms that distill knowledge from a deeper and stronger teacher GNN.
翻译:知识蒸馏已被证明能有效提升图神经网络性能,其典型目标是将深度更大的教师GNN的知识蒸馏至浅层学生GNN中。然而,由于众所周知的过参数化和过平滑问题,训练令人满意的深层GNN往往极具挑战性,导致实际应用中的知识迁移失效。本文提出首个基于强化学习的图神经网络自由方向知识蒸馏框架FreeKD,该框架无需预先提供深度优化的教师GNN。核心思想是协同学习两个浅层GNN以实现知识互交换。我们观察到典型GNN模型在训练过程中不同节点的表现存在优劣差异,据此设计了动态自由方向知识迁移策略,包含两个层级动作:1)节点级动作决定两个网络对应节点间的知识迁移方向;2)结构级动作决定节点级动作产生的局部结构哪些应被传播。此外,考虑到不同增强图可能捕捉图数据的差异化视角,我们提出FreeKD-Prompt,基于提示学习获取无畸变且多样化的增强方式以交换差异化知识。进一步突破双网络知识交换限制,我们开发FreeKD++实现多GNN间的自由方向知识迁移。在五个基准数据集上的大量实验表明,所提方法显著优于基础GNN。更令人惊喜的是,FreeKD的性能与从更深更强教师GNN蒸馏知识的传统KD算法相比毫不逊色,甚至更优。