Retrieval-augmented generation (RAG) has received much attention for Open-domain question-answering (ODQA) tasks as a means to compensate for the parametric knowledge of large language models (LLMs). While previous approaches focused on processing retrieved passages to remove irrelevant context, they still rely heavily on the quality of retrieved passages which can degrade if the question is ambiguous or complex. In this paper, we propose a simple yet efficient method called question and passage augmentation via LLMs for open-domain QA. Our method first decomposes the original questions into multiple-step sub-questions. By augmenting the original question with detailed sub-questions and planning, we are able to make the query more specific on what needs to be retrieved, improving the retrieval performance. In addition, to compensate for the case where the retrieved passages contain distracting information or divided opinions, we augment the retrieved passages with self-generated passages by LLMs to guide the answer extraction. Experimental results show that the proposed scheme outperforms the previous state-of-the-art and achieves significant performance gain over existing RAG methods.
翻译:检索增强生成(RAG)作为一种弥补大型语言模型(LLMs)参数化知识不足的手段,在开放域问答(ODQA)任务中受到广泛关注。以往的研究主要集中于对检索到的段落进行处理以消除无关上下文,但这些方法仍高度依赖检索段落的质量——当问题存在歧义或复杂度较高时,检索质量可能显著下降。本文提出一种简洁高效的开放域问答方法:基于LLMs的查询与段落增强技术。该方法首先将原始问题分解为多步骤子问题,通过结合详细子问题与规划步骤对原始查询进行增强,使检索目标更为明确,从而提升检索性能。此外,针对检索段落可能包含干扰信息或存在观点分歧的情况,我们利用LLMs生成自建段落对检索结果进行增强,以引导答案提取过程。实验结果表明,所提方案优于现有最优方法,并在现有RAG方法基础上实现了显著的性能提升。