Text-attributed graphs have recently garnered significant attention due to their wide range of applications in web domains. Existing methodologies employ word embedding models for acquiring text representations as node features, which are subsequently fed into Graph Neural Networks (GNNs) for training. Recently, the advent of Large Language Models (LLMs) has introduced their powerful capabilities in information retrieval and text generation, which can greatly enhance the text attributes of graph data. Furthermore, the acquisition and labeling of extensive datasets are both costly and time-consuming endeavors. Consequently, few-shot learning has emerged as a crucial problem in the context of graph learning tasks. In order to tackle this challenge, we propose a lightweight paradigm called LLM4NG, which adopts a plug-and-play approach to empower text-attributed graphs through node generation using LLMs. Specifically, we utilize LLMs to extract semantic information from the labels and generate samples that belong to these categories as exemplars. Subsequently, we employ an edge predictor to capture the structural information inherent in the raw dataset and integrate the newly generated samples into the original graph. This approach harnesses LLMs for enhancing class-level information and seamlessly introduces labeled nodes and edges without modifying the raw dataset, thereby facilitating the node classification task in few-shot scenarios. Extensive experiments demonstrate the outstanding performance of our proposed paradigm, particularly in low-shot scenarios. For instance, in the 1-shot setting of the ogbn-arxiv dataset, LLM4NG achieves a 76% improvement over the baseline model.
翻译:文本属性图因其在网络领域的广泛应用而备受关注。现有方法采用词嵌入模型获取文本表示作为节点特征,随后将其输入图神经网络进行训练。近期,大型语言模型凭借其在信息检索与文本生成方面的强大能力,为增强图数据的文本属性提供了新的可能。此外,大规模数据集的获取与标注成本高昂且耗时,使得少样本学习成为图学习任务中的关键问题。为应对这一挑战,本文提出一种轻量级范式LLM4NG,采用即插即用方式,通过LLM的节点生成能力增强文本属性图。具体而言,我们利用LLM从标签中提取语义信息,生成属于对应类别的样本作为范例;随后采用边预测器捕获原始数据集的结构信息,并将新生成样本整合至原图中。该方法利用LLM增强类别级信息,在不修改原始数据集的前提下无缝引入带标签的节点与边,从而促进少样本场景下的节点分类任务。大量实验表明,所提范式在少样本场景下表现优异,例如在ogbn-arxiv数据集的1-shot设置中,LLM4NG相比基线模型实现了76%的性能提升。