How can we learn effective node representations on textual graphs? Graph Neural Networks (GNNs) that use Language Models (LMs) to encode textual information of graphs achieve state-of-the-art performance in many node classification tasks. Yet, combining GNNs with LMs has not been widely explored for practical deployments due to its scalability issues. In this work, we tackle this challenge by developing a Graph-Aware Distillation framework (GRAD) to encode graph structures into an LM for graph-free, fast inference. Different from conventional knowledge distillation, GRAD jointly optimizes a GNN teacher and a graph-free student over the graph's nodes via a shared LM. This encourages the graph-free student to exploit graph information encoded by the GNN teacher while at the same time, enables the GNN teacher to better leverage textual information from unlabeled nodes. As a result, the teacher and the student models learn from each other to improve their overall performance. Experiments in eight node classification benchmarks in both transductive and inductive settings showcase GRAD's superiority over existing distillation approaches for textual graphs.
翻译:如何在文本图上学习有效的节点表示?使用语言模型(LM)编码图文本信息的图神经网络(GNN)在许多节点分类任务中取得了最先进的性能。然而,由于可扩展性问题,将GNN与LM结合在实际部署中尚未得到广泛探索。本研究通过开发图感知蒸馏框架(GRAD)来应对这一挑战,该框架将图结构编码到语言模型中,实现无图快速推理。与传统知识蒸馏不同,GRAD通过共享的语言模型联合优化图上的GNN教师和无图学生模型。这促使无图学生模型利用GNN教师编码的图信息,同时使GNN教师更好地利用未标记节点的文本信息。因此,教师和学生模型相互学习以提升整体性能。在直推式和归纳式设置下的八个节点分类基准实验中,GRAD在文本图上的蒸馏方法中展现出优越性。