Retrieval-Augmented Generation (RAG) has advanced open-domain question answering by incorporating external information into model reasoning. However, effectively leveraging external information to enhance reasoning presents the following challenges: (1) low signal-to-noise ratio, where answer-supportive external information is diluted by irrelevant material, and (2) error accumulation, which arises in multi-hop reasoning when incomplete or misleading information is incorporated. To address these challenges, we introduce EviNote-RAG, a framework that follows a retrieve-note-answer workflow. Instead of reasoning directly over raw external information, the model first produces Supportive-Evidence Notes (SENs), which concisely preserve answer-critical information and explicitly mark key and uncertainty information to improve accuracy. We further design an entailment-based Evidence Quality Reward (EQR) to ensure that SENs are logically sufficient to derive the final answer, thereby enhancing SENs' quality. Experiments on both in-domain and out-of-domain QA benchmarks show that EviNote-RAG achieves state-of-the-art performance, improving answer accuracy, training stability, robustness, and efficiency. In particular, it yields relative F1 gains of 20% on HotpotQA (+0.093), 40% on Bamboogle (+0.151), and 91% on 2Wiki (+0.256), benefiting from improvements in the reasoning process.
翻译:检索增强生成(RAG)通过将外部信息融入模型推理,推动了开放域问答的发展。然而,有效利用外部信息以增强推理面临以下挑战:(1)信噪比低,即支持答案的外部信息被无关材料稀释;(2)错误累积,在多跳推理中因引入不完整或误导性信息而产生。为应对这些挑战,我们提出了EviNote-RAG框架,该框架遵循“检索-笔记-回答”的工作流程。模型并非直接基于原始外部信息进行推理,而是首先生成支持性证据笔记(SENs),其精炼地保留答案关键信息,并显式标注关键信息与不确定性信息以提高准确性。我们进一步设计了基于蕴涵关系的证据质量奖励(EQR),以确保SENs在逻辑上足以推导出最终答案,从而提升SENs的质量。在领域内与领域外问答基准上的实验表明,EviNote-RAG实现了最先进的性能,在答案准确性、训练稳定性、鲁棒性和效率方面均有提升。特别地,得益于推理过程的改进,该模型在HotpotQA(+0.093)、Bamboogle(+0.151)和2Wiki(+0.256)上分别获得了20%、40%和91%的相对F1分数提升。