Parametric Retrieval-Augmented Generation (PRAG) is a RAG approach that integrates external knowledge directly into model parameters using a LoRA adapter, aiming at reducing the inference cost compared to traditional RAG. However, current PRAG approaches adopt a \textit{one-to-one} document encoding scheme, using a dedicated LoRA adapter for each individual document. This scheme introduces two major limitations: 1) As the number of documents increases, there will be a prohibitive cost for training and storage. 2) The LoRA adapters may largely overlap due to the shared knowledge across documents, making the approach highly inefficient. To overcome these challenges, we propose the Poly-PRAG approach, which uses a small set of LoRA adapters that are able to encode more general knowledge. Each document can be encoded using a combination of them through a latent routing function. By jointly training the LoRA adapters and the latent routing function, each LoRA adapter is able to encode a shared part of the knowledge across documents, and the routing function can select the best combination of adapters for a document. Experimental results on four benchmarks demonstrate the effectiveness of the Poly-PRAG compared to other strong PRAG baselines. In addition, this approach reduces the storage requirement by avoiding the need to store a large number of LoRA adapters and offers a more efficient way to encode external knowledge into LLMs.
翻译:参数化检索增强生成(PRAG)是一种通过LoRA适配器将外部知识直接集成到模型参数中的RAG方法,旨在相比传统RAG降低推理成本。然而,现有PRAG方法采用\textit{一对一}文档编码方案,为每个独立文档使用专用LoRA适配器。该方案存在两大局限:1)随着文档数量增加,训练和存储成本将急剧上升;2)由于文档间存在共享知识,LoRA适配器可能产生大量重叠,导致方法效率低下。为克服这些挑战,我们提出Poly-PRAG方法,该方法使用少量能够编码更通用知识的LoRA适配器。通过潜在路由函数,每个文档可由这些适配器的组合进行编码。通过联合训练LoRA适配器与潜在路由函数,每个LoRA适配器能够编码文档间的共享知识部分,而路由函数可为文档选择最佳的适配器组合。在四个基准测试上的实验结果表明,Poly-PRAG相比其他强PRAG基线具有显著优势。此外,该方法通过避免存储大量LoRA适配器降低了存储需求,为将外部知识编码至大语言模型提供了更高效的途径。