Retrieval-augmented generation (RAG) has received much attention for Open-domain question-answering (ODQA) tasks as a means to compensate for the parametric knowledge of large language models (LLMs). While previous approaches focused on processing retrieved passages to remove irrelevant context, they still rely heavily on the quality of retrieved passages which can degrade if the question is ambiguous or complex. In this paper, we propose a simple yet efficient method called question and passage augmentation (QPaug) via LLMs for open-domain QA. QPaug first decomposes the original questions into multiple-step sub-questions. By augmenting the original question with detailed sub-questions and planning, we are able to make the query more specific on what needs to be retrieved, improving the retrieval performance. In addition, to compensate for the case where the retrieved passages contain distracting information or divided opinions, we augment the retrieved passages with self-generated passages by LLMs to guide the answer extraction. Experimental results show that QPaug outperforms the previous state-of-the-art and achieves significant performance gain over existing RAG methods. The source code is available at \url{https://github.com/kmswin1/QPaug}.
翻译:检索增强生成(RAG)作为一种弥补大语言模型(LLMs)参数化知识不足的手段,在开放域问答(ODQA)任务中受到了广泛关注。以往的研究主要集中于对检索到的段落进行处理以剔除无关上下文,但其效果仍高度依赖于检索段落的质量——当问题存在歧义或较为复杂时,检索质量可能显著下降。本文提出一种基于大语言模型的简单而高效的方法,即问题与段落增强(QPaug),用于开放域问答。QPaug首先将原始问题分解为多步骤子问题。通过使用详细的子问题与规划对原始问题进行增强,我们能够使查询在需要检索的内容上更加具体,从而提升检索性能。此外,针对检索段落可能包含干扰信息或存在观点分歧的情况,我们利用大语言模型生成的自建段落对检索到的段落进行增强,以引导答案提取。实验结果表明,QPaug优于以往的最先进方法,并在现有RAG方法基础上实现了显著的性能提升。源代码发布于 \url{https://github.com/kmswin1/QPaug}。