Exploring the application of large language models (LLMs) to graph learning is a emerging endeavor. However, the vast amount of information inherent in large graphs poses significant challenges to this process. This work focuses on the link prediction task and introduces $\textbf{LPNL}$ (Link Prediction via Natural Language), a framework based on large language models designed for scalable link prediction on large-scale heterogeneous graphs. We design novel prompts for link prediction that articulate graph details in natural language. We propose a two-stage sampling pipeline to extract crucial information from the graphs, and a divide-and-conquer strategy to control the input tokens within predefined limits, addressing the challenge of overwhelming information. We fine-tune a T5 model based on our self-supervised learning designed for link prediction. Extensive experimental results demonstrate that LPNL outperforms multiple advanced baselines in link prediction tasks on large-scale graphs.
翻译:探索大型语言模型(LLMs)在图学习中的应用是一项新兴研究。然而,大规模图结构中蕴含的海量信息给这一过程带来了巨大挑战。本文聚焦于链接预测任务,提出了一种名为$\textbf{LPNL}$(基于自然语言的链接预测)的框架,该框架基于大型语言模型设计,旨在实现大规模异构图上的可扩展链接预测。我们设计了新颖的链接预测提示,通过自然语言描述图的细节信息。此外,我们提出了一种两阶段采样流程来提取图中的关键信息,并采用分治策略将输入令牌控制在预设范围内,以应对信息过载的难题。我们基于自监督学习微调了一个T5模型,专门用于链接预测任务。大量实验结果表明,LPNL在大规模图上的链接预测任务中优于多个先进基线模型。