Retrieval-augmented large language models (LLMs) have demonstrated efficacy in knowledge-intensive tasks such as open-domain QA, addressing inherent challenges in knowledge update and factual inadequacy. However, inconsistencies between retrieval knowledge and the necessary knowledge for LLMs, leading to a decline in LLM's answer quality. This paper introduces BIDER, an approach that refines retrieval documents into Key Supporting Evidence (KSE) through knowledge synthesis, supervised fine-tuning (SFT), and preference alignment. We train BIDER by learning from crafting KSE, while maximizing its output to align with LLM's information acquisition preferences through reinforcement learning. Evaluations across five datasets show BIDER boosts LLMs' answer quality by 7% while reducing input content length in retrieval documents by 80%, outperforming existing methods. The proposed KSE simulation effectively equips LLMs with essential information for accurate question answering.
翻译:摘要:检索增强型大语言模型在开放域问答等知识密集型任务中展现出有效性,解决了知识更新与事实不足等固有挑战。然而,检索知识与大语言模型所需知识之间存在不一致性,导致模型回答质量下降。本文提出BIDER方法,通过知识合成、监督微调与偏好对齐,将检索文档精炼为关键支撑证据。我们通过从生成关键支撑证据中学习来训练BIDER,同时利用强化学习最大化其输出与大语言模型信息获取偏好的对齐。在五个数据集上的评估表明,BIDER在将检索文档输入内容长度缩减80%的同时,使大语言模型回答质量提升7%,优于现有方法。所提出的关键支撑证据模拟方法能有效为大语言模型提供准确问答所需的关键信息。