Retrieval-augmented large language models (LLMs) have demonstrated efficacy in knowledge-intensive tasks such as open-domain QA, addressing inherent challenges in knowledge update and factual inadequacy. However, inconsistencies between retrieval knowledge and the necessary knowledge for LLMs, leading to a decline in LLM's answer quality. This paper introduces BIDER, an approach that refines retrieval documents into Key Supporting Evidence (KSE) through knowledge synthesis, supervised fine-tuning (SFT), and preference alignment. We train BIDER by learning from crafting KSE, while maximizing its output to align with LLM's information acquisition preferences through reinforcement learning. Evaluations across five datasets show BIDER boosts LLMs' answer quality by 7% while reducing input content length in retrieval documents by 80%, outperforming existing methods. The proposed KSE simulation effectively equips LLMs with essential information for accurate question answering.
翻译:检索增强型大语言模型(LLMs)在开放域问答等知识密集型任务中展现出显著效果,有效应对了知识更新与事实性不足的内在挑战。然而,检索所得知识与LLMs所需知识间存在不一致性,导致LLM的答案质量下降。本文提出BIDER方法,通过知识合成、监督微调(SFT)与偏好对齐,将检索文档精炼为关键支持证据(KSE)。我们通过构建KSE的过程训练BIDER,同时利用强化学习最大化其输出与LLM信息获取偏好的对齐。在五个数据集上的评估表明,BIDER将LLMs的答案质量提升7%,同时将检索文档的输入内容长度减少80%,性能优于现有方法。所提出的KSE模拟机制能有效为LLMs提供准确问答所需的关键信息。