Retrieval-Augmented Generation (RAG) has emerged as a paradigm for enhancing large language models (LLMs) with external knowledge, yet existing graph-based methods face a fundamental limitation: entity-centric and chunk-centric approaches operate on representations anchored to original text without true knowledge fusion. While entity-centric methods connect logically related content and chunk-centric methods preserve context, both retrieve information separately through similarity search, missing emergent understanding from their synthesis. In this paper, we propose HyGRAG, a hierarchical graph RAG framework that transcends source documents by addressing three core challenges: constructing summaries that genuinely integrate contextual and relational information, leveraging these synthesized representations to access emergent knowledge during retrieval, and efficiently updating hierarchical structures for dynamic corpora. Specifically, we design hierarchical index structures over hybrid graphs with both chunk and entity nodes, then iteratively cluster them and generate LLM-based summaries. Then, we design context and relation-aware retrieval that searches across all abstraction levels while expanding through community membership. Moreover, we enable dynamic knowledge update through attachment-based algorithms with only local re-summarization. Experimental results show that HyGRAG improves the average accuracy of multi-hop reasoning tasks by 9.7%, while maintaining reasonable efficiency.
翻译:[translated abstract in Chinese]
检索增强生成(RAG)已成为通过外部知识增强大语言模型(LLM)的范式,然而现有基于图的方法存在根本性局限:以实体为中心和以片段为中心的方法所操作的表示始终锚定于原始文本,缺乏真正的知识融合。尽管以实体为中心的方法能关联逻辑相关内容,以片段为中心的方法能保留上下文,但两者均通过相似性搜索独立检索信息,未能从合成中获取涌现性理解。本文提出 HyGRAG——一种超越源文档的分层图 RAG 框架,通过解决三个核心挑战实现突破:构建真正整合上下文与关系信息的摘要、利用这些合成表示在检索过程中获取涌现知识、以及针对动态语料库高效更新分层结构。具体而言,我们在同时包含片段节点和实体节点的混合图上设计分层索引结构,通过迭代聚类生成基于大语言模型的摘要。随后设计面向上下文与关系感知的检索机制,在跨抽象层级搜索的同时通过社区成员关系进行扩展。此外,我们通过基于附加的算法实现动态知识更新,仅需局部重摘要。实验结果表明,HyGRAG 在多跳推理任务中平均准确率提升 9.7%,同时保持合理的效率。