In the past, Retrieval-Augmented Generation (RAG) methods split text into chunks to enable language models to handle long documents. Recent tree-based RAG methods are able to retrieve detailed information while preserving global context. However, with the advent of more powerful LLMs, such as Llama 3.1, which offer better comprehension and support for longer inputs, we found that even recent tree-based RAG methods perform worse than directly feeding the entire document into Llama 3.1, although RAG methods still hold an advantage in reducing computational costs. In this paper, we propose a new retrieval method, called LLM-Guided Dynamic Progress Control with Hierarchical Weighted Graph (GARLIC), which outperforms previous state-of-the-art baselines, including Llama 3.1, while retaining the computational efficiency of RAG methods. Our method introduces several improvements: (1) Rather than using a tree structure, we construct a Hierarchical Weighted Directed Acyclic Graph with many-to-many summarization, where the graph edges are derived from attention mechanisms, and each node focuses on a single event or very few events. (2) We introduce a novel retrieval method that leverages the attention weights of LLMs rather than dense embedding similarity. Our method allows for searching the graph along multiple paths and can terminate at any depth. (3) We use the LLM to control the retrieval process, enabling it to dynamically adjust the amount and depth of information retrieved for different queries. Experimental results show that our method outperforms previous state-of-the-art baselines, including Llama 3.1, on two single-document and two multi-document QA datasets, while maintaining similar computational complexity to traditional RAG methods.
翻译:以往,检索增强生成(RAG)方法通过将文本分割成块,使语言模型能够处理长文档。近期基于树的RAG方法能够在保留全局上下文的同时检索详细信息。然而,随着更强大语言模型(如Llama 3.1)的出现,其具备更好的理解能力和对更长输入的支持,我们发现即使是最新的基于树的RAG方法,其性能也逊于直接将整个文档输入Llama 3.1,尽管RAG方法在降低计算成本方面仍具优势。本文提出一种新的检索方法,称为基于分层加权图与LLM引导的动态进度控制(GARLIC),该方法在保持RAG方法计算效率的同时,性能超越了包括Llama 3.1在内的先前最先进基线。我们的方法引入了多项改进:(1)我们不再使用树结构,而是构建一个具有多对多摘要功能的分层加权有向无环图,其中图的边源自注意力机制,每个节点专注于单个或极少数事件。(2)我们提出一种新颖的检索方法,该方法利用LLM的注意力权重而非稠密嵌入相似性。我们的方法允许沿多条路径搜索图,并可在任意深度终止。(3)我们使用LLM控制检索过程,使其能够针对不同查询动态调整检索信息的数量和深度。实验结果表明,在两个单文档和两个多文档问答数据集上,我们的方法在保持与传统RAG方法相近计算复杂度的同时,性能超越了包括Llama 3.1在内的先前最先进基线。