This paper introduces uRAG--a framework with a unified retrieval engine that serves multiple downstream retrieval-augmented generation (RAG) systems. Each RAG system consumes the retrieval results for a unique purpose, such as open-domain question answering, fact verification, entity linking, and relation extraction. We introduce a generic training guideline that standardizes the communication between the search engine and the downstream RAG systems that engage in optimizing the retrieval model. This lays the groundwork for us to build a large-scale experimentation ecosystem consisting of 18 RAG systems that engage in training and 18 unknown RAG systems that use the uRAG as the new users of the search engine. Using this experimentation ecosystem, we answer a number of fundamental research questions that improve our understanding of promises and challenges in developing search engines for machines.
翻译:本文提出uRAG——一种服务于多个下游检索增强生成(RAG)系统的统一检索框架。每个RAG系统以不同目的使用检索结果,例如开放域问答、事实验证、实体链接与关系抽取。我们引入了一套通用训练准则,标准化了搜索引擎与参与优化检索模型的下游RAG系统之间的交互。这为构建包含18个参与训练的RAG系统及18个作为搜索引擎新用户的未知RAG系统的大规模实验生态奠定了基础。通过该实验生态,我们回答了多项基础研究问题,深化了对开发面向机器搜索引擎的优势与挑战的理解。