Retrieval-Augmented Generation (RAG) is a prevalent approach to infuse a private knowledge base of documents with Large Language Models (LLM) to build Generative Q\&A (Question-Answering) systems. However, RAG accuracy becomes increasingly challenging as the corpus of documents scales up, with Retrievers playing an outsized role in the overall RAG accuracy by extracting the most relevant document from the corpus to provide context to the LLM. In this paper, we propose the 'Blended RAG' method of leveraging semantic search techniques, such as Dense Vector indexes and Sparse Encoder indexes, blended with hybrid query strategies. Our study achieves better retrieval results and sets new benchmarks for IR (Information Retrieval) datasets like NQ and TREC-COVID datasets. We further extend such a 'Blended Retriever' to the RAG system to demonstrate far superior results on Generative Q\&A datasets like SQUAD, even surpassing fine-tuning performance.
翻译:检索增强生成(RAG)是一种将私有文档知识库与大型语言模型(LLM)融合以构建生成式问答系统的常用方法。然而,随着文档语料库规模的扩大,RAG的准确性面临日益严峻的挑战——检索器通过从语料库中提取最相关的文档为LLM提供上下文,对整体RAG精度起着至关重要的作用。本文提出“混合式RAG”方法,该方法融合了稠密向量索引与稀疏编码器索引等语义搜索技术,并结合混合查询策略。我们的研究在NQ和TREC-COVID等信息检索数据集上取得了更优的检索结果,并创造了新的性能基准。我们进一步将此类“混合检索器”扩展至RAG系统,在SQUAD等生成式问答数据集上展现出显著优于传统方法的效果,甚至超越了微调模型的性能。