Graph Learning in the Era of LLMs: A Survey from the Perspective of Data, Models, and Tasks

With the increasing prevalence of cross-domain Text-Attributed Graph (TAG) Data (e.g., citation networks, recommendation systems, social networks, and ai4science), the integration of Graph Neural Networks (GNNs) and Large Language Models (LLMs) into a unified Model architecture (e.g., LLM as enhancer, LLM as collaborators, LLM as predictor) has emerged as a promising technological paradigm. The core of this new graph learning paradigm lies in the synergistic combination of GNNs' ability to capture complex structural relationships and LLMs' proficiency in understanding informative contexts from the rich textual descriptions of graphs. Therefore, we can leverage graph description texts with rich semantic context to fundamentally enhance Data quality, thereby improving the representational capacity of model-centric approaches in line with data-centric machine learning principles. By leveraging the strengths of these distinct neural network architectures, this integrated approach addresses a wide range of TAG-based Task (e.g., graph learning, graph reasoning, and graph question answering), particularly in complex industrial scenarios (e.g., supervised, few-shot, and zero-shot settings). In other words, we can treat text as a medium to enable cross-domain generalization of graph learning Model, allowing a single graph model to effectively handle the diversity of downstream graph-based Task across different data domains. This work serves as a foundational reference for researchers and practitioners looking to advance graph learning methodologies in the rapidly evolving landscape of LLM. We consistently maintain the related open-source materials at \url{https://github.com/xkLi-Allen/Awesome-GNN-in-LLMs-Papers}.

翻译：随着跨领域文本属性图（TAG）数据（例如引文网络、推荐系统、社交网络和AI4Science）日益普遍，将图神经网络（GNNs）与大语言模型（LLMs）整合为统一的模型架构（例如LLM作为增强器、LLM作为协作者、LLM作为预测器）已成为一种前景广阔的技术范式。这一新兴图学习范式的核心在于协同结合GNNs捕捉复杂结构关系的能力与LLMs从图丰富的文本描述中理解信息性上下文的能力。因此，我们可以利用具有丰富语义上下文的图描述文本，从根本上提升数据质量，从而遵循以数据为中心的机器学习原则，增强以模型为中心方法的表示能力。通过发挥这些不同神经网络架构的优势，这种集成方法能够应对广泛的基于TAG的任务（例如图学习、图推理和图问答），尤其是在复杂的工业场景中（例如监督、少样本和零样本设置）。换言之，我们可以将文本视为一种媒介，以实现图学习模型的跨领域泛化，使单一的图模型能够有效处理不同数据领域中下游基于图的任务的多样性。本工作为寻求在快速演进的LLM生态中推进图学习方法的研究者和实践者提供了基础性参考。我们持续维护相关的开源材料于 \url{https://github.com/xkLi-Allen/Awesome-GNN-in-LLMs-Papers}。