Despite the impressive advancements of Large Language Models (LLMs) in generating text, they are often limited by the knowledge contained in the input and prone to producing inaccurate or hallucinated content. To tackle these issues, Retrieval-augmented Generation (RAG) is employed as an effective strategy to enhance the available knowledge base and anchor the responses in reality by pulling additional texts from external databases. In real-world applications, texts are often linked through entities within a graph, such as citations in academic papers or comments in social networks. This paper exploits these topological relationships to guide the retrieval process in RAG. Specifically, we explore two kinds of topological connections: proximity-based, focusing on closely connected nodes, and role-based, which looks at nodes sharing similar subgraph structures. Our empirical research confirms their relevance to text relationships, leading us to develop a Topology-aware Retrieval-augmented Generation framework. This framework includes a retrieval module that selects texts based on their topological relationships and an aggregation module that integrates these texts into prompts to stimulate LLMs for text generation. We have curated established text-attributed networks and conducted comprehensive experiments to validate the effectiveness of this framework, demonstrating its potential to enhance RAG with topological awareness.
翻译:尽管大型语言模型(LLM)在文本生成方面取得了令人瞩目的进展,但其生成能力常受限于输入文本所包含的知识,且易产生不准确或虚构的内容。为应对这些问题,检索增强生成(RAG)作为一种有效策略被广泛采用,通过从外部数据库检索补充文本,以扩展可用知识库并使生成内容更贴合实际。在实际应用中,文本常通过图中的实体相互关联,例如学术论文中的引用关系或社交网络中的评论互动。本文利用此类拓扑关系来指导RAG中的检索过程。具体而言,我们探索了两种拓扑关联模式:基于邻近性的检索(关注紧密连接的节点)和基于角色相似性的检索(关注具有相似子图结构的节点)。实证研究证实了这两种拓扑关系与文本语义关联的相关性,据此我们提出了一个拓扑感知的检索增强生成框架。该框架包含基于拓扑关系筛选文本的检索模块,以及将选定文本整合至提示模板以激发LLM生成文本的聚合模块。我们构建了规范的文本属性网络数据集并进行了全面实验,验证了该框架的有效性,证明了拓扑感知机制对增强RAG系统的潜在价值。