The latest advancements in large language models (LLMs) have revolutionized the field of natural language processing (NLP). Inspired by the success of LLMs in NLP tasks, some recent work has begun investigating the potential of applying LLMs in graph learning tasks. However, most of the existing work focuses on utilizing LLMs as powerful node feature augmenters, leaving employing LLMs to enhance graph topological structures an understudied problem. In this work, we explore how to leverage the information retrieval and text generation capabilities of LLMs to refine/enhance the topological structure of text-attributed graphs (TAGs) under the node classification setting. First, we propose using LLMs to help remove unreliable edges and add reliable ones in the TAG. Specifically, we first let the LLM output the semantic similarity between node attributes through delicate prompt designs, and then perform edge deletion and edge addition based on the similarity. Second, we propose using pseudo-labels generated by the LLM to improve graph topology, that is, we introduce the pseudo-label propagation as a regularization to guide the graph neural network (GNN) in learning proper edge weights. Finally, we incorporate the two aforementioned LLM-based methods for graph topological refinement into the process of GNN training, and perform extensive experiments on four real-world datasets. The experimental results demonstrate the effectiveness of LLM-based graph topology refinement (achieving a 0.15%--2.47% performance gain on public benchmarks).
翻译:最新的大语言模型(LLMs)进展已彻底改变了自然语言处理(NLP)领域。受LLMs在NLP任务中成功的启发,近期部分研究开始探索将LLMs应用于图学习任务的潜力。然而,现有工作大多聚焦于利用LLMs作为强大的节点特征增强器,而将LLMs用于增强图拓扑结构仍是一个未充分研究的问题。本研究探讨如何利用LLMs的信息检索与文本生成能力,在节点分类设定下优化/增强文本属性图(TAGs)的拓扑结构。首先,我们提出利用LLMs帮助移除TAG中不可靠的边并添加可靠边:具体通过精心设计的提示词让LLM输出节点属性间的语义相似度,再基于该相似度进行边删除与边添加。其次,我们提出利用LLM生成的伪标签改善图拓扑——即引入伪标签传播作为正则化项,引导图神经网络(GNN)学习适当的边权重。最后,我们将上述两种基于LLM的图拓扑优化方法融入GNN训练过程,并在四个真实数据集上开展广泛实验。结果表明,基于LLM的图拓扑优化方法效果显著(在公开基准上实现0.15%至2.47%的性能提升)。