UnWeaving the knots of GraphRAG -- turns out VectorRAG is almost enough

One of the key problems in Retrieval-augmented generation (RAG) systems is that chunk-based retrieval pipelines represent the source chunks as atomic objects, mixing the information contained within such a chunk into a single vector. These vector representations are then fundamentally treated as isolated, independent and self-sufficient, with no attempt to represent possible relations between them. Such an approach has no dedicated mechanisms for handling multi-hop questions. Graph-based RAG systems aimed to ameliorate this problem by modeling information as knowledge-graphs, with entities represented by nodes being connected by robust relations, and forming hierarchical communities. This approach however suffers from its own issues with some of them being: orders of magnitude increased componential complexity in order to create graph-based indices, and reliance on heuristics for performing retrieval. We propose UnWeaver, a novel RAG framework simplifying the idea of GraphRAG. UnWeaver disentangles the contents of the documents into entities which can occur across multiple chunks using an LLM. In the retrieval process entities are used as an intermediate way of recovering original text chunks hence preserving fidelity to the source material. We argue that entity-based decomposition yields a more distilled representation of original information, and additionally serves to reduce noise in the indexing, and generation process. Furthermore we experimentally show that on end to end QA evaluation VectorRAG performs better than standard GraphRAG and almost as good as current SOTA graph-based solutions, for a fraction of the cost.

翻译：检索增强生成（RAG）系统中的关键问题之一在于，基于分块的检索管线将源分块视为原子对象，将其包含的信息混合为单一向量。这些向量表示随后被本质性地视为孤立、独立且自足的，未尝试表征它们之间可能存在的关联。这类方法缺乏处理多跳问题的专用机制。基于图的RAG系统旨在通过将信息建模为知识图谱来改进这一问题——其中由节点表示的实体通过稳健关系连接并形成层次化社区。然而，这种方法本身也面临挑战，包括：构建基于图的索引所需的组件复杂性增加数个数量级，以及依赖启发式方法执行检索。我们提出UnWeaver——一种简化GraphRAG思想的新型RAG框架。UnWeaver利用大语言模型将文档内容解耦为可能跨多个分块出现的实体。在检索过程中，实体作为恢复原始文本分块的中间媒介，从而保持对源材料的保真度。我们论证基于实体的分解能生成更精炼的原始信息表征，同时有助于降低索引和生成过程中的噪声。此外，实验表明，在端到端问答评估中，VectorRAG的性能优于标准GraphRAG，并以极低的成本达到接近当前最先进图基解决方案的水平。

相关内容

实体

关注 12

实体（entity）是有可区别性且独立存在的某种事物，但它不需要是物质上的存在。尤其是抽象和法律拟制也通常被视为实体。实体可被看成是一包含有子集的集合。在哲学里，这种集合被称为客体。实体可被使用来指涉某个可能是人、动物、植物或真菌等不会思考的生命、无生命物体或信念等的事物。在这一方面，实体可以被视为一全包的词语。有时，实体被当做本质的广义，不论即指的是否为物质上的存在，如时常会指涉到的无物质形式的实体－语言。更有甚者，实体有时亦指存在或本质本身。在法律上，实体是指能具有权利和义务的事物。这通常是指法人，但也包括自然人。

【AAAI2026】TruthfulRAG：基于知识图谱解决检索增强生成中的事实层冲突

专知会员服务

22+阅读 · 2025年11月15日

检索增强生成（RAG）技术，261页slides

专知会员服务

42+阅读 · 2025年10月16日

【新书】Essential GraphRAG: 知识图谱增强的RAG

专知会员服务

35+阅读 · 2025年7月17日

【SIGIR2025教程】动态与参数化检索增强生成

专知会员服务

17+阅读 · 2025年7月14日