Knowledge distillation (KD) techniques have emerged as a powerful tool for transferring expertise from complex teacher models to lightweight student models, particularly beneficial for deploying high-performance models in resource-constrained devices. This approach has been successfully applied to graph neural networks (GNNs), harnessing their expressive capabilities to generate node embeddings that capture structural and feature-related information. In this study, we depart from the conventional KD approach by exploring the potential of collaborative learning among GNNs. In the absence of a pre-trained teacher model, we show that relatively simple and shallow GNN architectures can synergetically learn efficient models capable of performing better during inference, particularly in tackling multiple tasks. We propose a collaborative learning framework where ensembles of student GNNs mutually teach each other throughout the training process. We introduce an adaptive logit weighting unit to facilitate efficient knowledge exchange among models and an entropy enhancement technique to improve mutual learning. These components dynamically empower the models to adapt their learning strategies during training, optimizing their performance for downstream tasks. Extensive experiments conducted on three datasets each for node and graph classification demonstrate the effectiveness of our approach.
翻译:知识蒸馏(KD)技术已成为将复杂教师模型的专业知识迁移至轻量级学生模型的有力工具,尤其有利于在资源受限设备上部署高性能模型。该方法已成功应用于图神经网络(GNNs),利用其表达能力生成捕获结构及特征相关信息的节点嵌入。本研究突破传统知识蒸馏框架,探索GNNs间协同学习的潜力。在缺乏预训练教师模型的情况下,我们证明相对简单且浅层的GNN架构能够通过协同学习获得高效模型,这些模型在推理阶段(尤其是在处理多任务时)表现更优。我们提出一种协同学习框架,其中学生GNN集成在训练过程中相互指导。我们引入了自适应逻辑加权单元以促进模型间高效的知识交换,以及熵增强技术以改进互学习过程。这些组件动态赋能模型,使其在训练过程中调整学习策略,从而优化下游任务性能。在节点分类和图分类各三个数据集上进行的大量实验验证了本方法的有效性。