Text-rich graphs, which exhibit rich textual information on nodes and edges, are prevalent across a wide range of real-world business applications. Large Language Models (LLMs) have demonstrated remarkable abilities in understanding text, which also introduced the potential for more expressive modeling in text-rich graphs. Despite these capabilities, efficiently applying LLMs to representation learning on graphs presents significant challenges. Recently, parameter-efficient fine-tuning methods for LLMs have enabled efficient new task generalization with minimal time and memory consumption. Inspired by this, we introduce Graph-aware Parameter-Efficient Fine-Tuning - GPEFT, a novel approach for efficient graph representation learning with LLMs on text-rich graphs. Specifically, we utilize a graph neural network (GNN) to encode structural information from neighboring nodes into a graph prompt. This prompt is then inserted at the beginning of the text sequence. To improve the quality of graph prompts, we pre-trained the GNN to assist the frozen LLM in predicting the next token in the node text. Compared with existing joint GNN and LMs, our method directly generate the node embeddings from large language models with an affordable fine-tuning cost. We validate our approach through comprehensive experiments conducted on 8 different text-rich graphs, observing an average improvement of 2% in hit@1 and Mean Reciprocal Rank (MRR) in link prediction evaluations. Our results demonstrate the efficacy and efficiency of our model, showing that it can be smoothly integrated with various large language models, including OPT, LLaMA and Falcon.
翻译:文本丰富图(即节点和边缘包含丰富文本信息的图)在众多实际商业应用中普遍存在。大语言模型(LLMs)在文本理解方面展现出卓越能力,这为文本丰富图的表达性建模带来了潜力。尽管具备这些能力,高效地将LLMs应用于图表示学习仍面临重大挑战。最近,针对LLMs的参数高效微调方法能够以极低的时间和内存消耗实现高效的新任务泛化。受此启发,我们提出了图感知参数高效微调方法(Graph-aware Parameter-Efficient Fine-Tuning,简称GPEFT),这是一种在文本丰富图中利用LLMs进行高效图表示学习的新方法。具体而言,我们采用图神经网络(GNN)将来自相邻节点的结构信息编码为图提示,并将该提示插入文本序列的开头。为提升图提示的质量,我们对GNN进行预训练,以辅助冻结的LLM预测节点文本中的下一个词元。与现有的联合GNN和语言模型方法相比,我们的方法能够以可承受的微调代价直接从大语言模型生成节点嵌入。通过在8个不同的文本丰富图上开展全面实验,我们在链接预测评估中发现hit@1和平均倒数排名(MRR)平均提升了2%。实验结果证明了我们模型的有效性和高效性,且该模型能够与包括OPT、LLaMA和Falcon在内的多种大语言模型无缝集成。