Retrieval-Augmented Generation (RAG) prevails in Large Language Models. It mainly consists of retrieval and generation. The retrieval modules (a.k.a. retrievers) aim to find useful information used to facilitate generation modules (a.k.a. generators). As such, generators' performance largely depends on the effectiveness and efficiency of retrievers. However, the retrieval paradigm that we design and use remains flat, which treats the retrieval procedures as a one-off deal with constant granularity. Despite effectiveness, we argue that they suffer from two limitations: (1) flat retrieval exerts a significant burden on one retriever; (2) constant granularity limits the ceiling of retrieval performance. In this work, we propose a progressive retrieval paradigm with coarse-to-fine granularity for RAG, termed FunnelRAG, so as to balance effectiveness and efficiency. Specifically, FunnelRAG establishes a progressive retrieval pipeline by collaborating coarse-to-fine granularity, large-to-small quantity, and low-to-high capacity, which can relieve the burden on one retriever and also promote the ceiling of retrieval performance. Extensive experiments manifest that FunnelRAG achieves comparable retrieval performance while the time overhead is reduced by nearly 40 percent.
翻译:检索增强生成在大型语言模型中广泛应用,其核心由检索与生成两部分构成。检索模块旨在为生成模块提供有助于生成的有用信息。因此,生成模块的性能在很大程度上取决于检索模块的有效性与效率。然而,当前设计与使用的检索范式仍较为扁平,将检索过程视为一次性、固定粒度的操作。尽管现有方法具备一定有效性,我们认为其存在两方面局限:(1)扁平检索对单一检索器造成显著负担;(2)固定粒度限制了检索性能的上限。本工作提出一种面向RAG的、具备由粗到细粒度的渐进式检索范式,称为FunnelRAG,以平衡检索效果与效率。具体而言,FunnelRAG通过协同“由粗到细的粒度、由多到少的数量、由低到高的能力”建立渐进式检索流程,既能减轻单一检索器的负担,又可提升检索性能的上限。大量实验表明,FunnelRAG在取得可比检索性能的同时,将时间开销降低了近40%。