Large Language Models (LLMs) play a powerful \textit{Reader} of the \textit{Retrieve-then-Read} pipeline, making great progress in knowledge-based open-domain tasks. This work introduces a new framework, \textit{Rewrite-Retrieve-Read} that improves the retrieval-augmented method from the perspective of the query rewriting. Prior studies mostly contribute to adapt the retriever or stimulate the reader. Different from them, our approach pay attention of the query adaptation. Because the original query can not be always optimal to retrieve for the LLM, especially in the real world.(1) We first prompt an LLM to rewrite the queries, then conduct retrieval-augmented reading. (2) We further apply a small language model as a trainable rewriter, which rewrite the search query to cater to the frozen retriever and the LLM reader. To fine-tune the rewriter, we first use a pseudo data to conduct supervised warm-up training. Then the \textit{Retrieve-then-Read} pipeline is modeled as a reinforcement learning context. The rewriter is further trained as a policy model by maximize the reward of the pipeline performance. Evaluation is performed on two downstream tasks, open-domain QA and multiple choice. Our framework is proved effective and scalable.
翻译:大语言模型(LLMs)在“检索-读取”流水线中扮演着强大的\textit{读取器}角色,在基于知识的开放域任务中取得了显著进展。本研究提出了一种新框架——\textit{重写-检索-读取},从查询重写的角度改进检索增强方法。先前的研究主要致力于适配检索器或激发读取器。与之不同,我们的方法关注查询适配问题,因为原始查询对于大语言模型而言并不总是最优的检索条件,尤其在真实场景中。(1)首先,我们提示大语言模型重写查询,然后执行检索增强的读取。(2)进一步,我们采用一个小型语言模型作为可训练的重写器,该重写器改写搜索查询以适配冻结的检索器和大语言模型读取器。为微调重写器,我们首先使用伪数据进行监督式预热训练。随后,将“检索-读取”流水线建模为强化学习上下文,重写器作为策略模型通过最大化流水线性能的奖励进行进一步训练。在两个下游任务(开放域问答和多项选择)上进行了评估。实验证明,我们的框架兼具有效性和可扩展性。