Open-domain question answering (QA) tasks usually require the retrieval of relevant information from a large corpus to generate accurate answers. We propose a novel approach called Generator-Retriever-Generator (GRG) that combines document retrieval techniques with a large language model (LLM), by first prompting the model to generate contextual documents based on a given question. In parallel, a dual-encoder network retrieves documents that are relevant to the question from an external corpus. The generated and retrieved documents are then passed to the second LLM, which generates the final answer. By combining document retrieval and LLM generation, our approach addresses the challenges of open-domain QA, such as generating informative and contextually relevant answers. GRG outperforms the state-of-the-art generate-then-read and retrieve-then-read pipelines (GENREAD and RFiD) improving their performance at least by +5.2, +4.2, and +1.6 on TriviaQA, NQ, and WebQ datasets, respectively. We provide code, datasets, and checkpoints \footnote{\url{https://github.com/abdoelsayed2016/GRG}}
翻译:开放域问答任务通常需要从大规模语料库中检索相关信息以生成准确答案。我们提出一种名为生成器-检索器-生成器(GRG)的新方法,该方法将文档检索技术与大型语言模型相结合,首先引导模型根据给定问题生成上下文文档。与此同时,双编码器网络从外部语料库中检索与问题相关的文档。生成的文档与检索到的文档随后被输入第二个大型语言模型,由其生成最终答案。通过结合文档检索与LLM生成,我们的方法解决了开放域问答中的挑战,如生成信息丰富且上下文相关的答案。GRG在TriviaQA、NQ和WebQ数据集上分别以至少+5.2、+4.2和+1.6的性能提升超越了最先进的"生成-再读取"与"检索-再读取"流水线(GENREAD和RFiD)。我们提供代码、数据集及检查点\footnote{\url{https://github.com/abdoelsayed2016/GRG}}