Rethinking the Reranker: Boundary-Aware Evidence Selection for Robust Retrieval-Augmented Generation

Jiashuo Sun,Pengcheng Jiang,Saizhuo Wang,Jiajun Fan,Heng Wang,Siru Ouyang,Ming Zhong,Yizhu Jiao,Chengsong Huang,Xueqiang Xu,Pengrui Han,Peiran Li,Jiaxin Huang,Ge Liu,Heng Ji,Jiawei Han

from arxiv, 19 pages, 8 tables, 5 figures

Retrieval-Augmented Generation (RAG) systems remain brittle under realistic retrieval noise, even when the required evidence appears in the top-K results. A key reason is that retrievers and rerankers optimize solely for relevance, often selecting either trivial, answer-revealing passages or evidence that lacks the critical information required to answer the question, without considering whether the evidence is suitable for the generator. We propose BAR-RAG, which reframes the reranker as a boundary-aware evidence selector that targets the generator's Goldilocks Zone -- evidence that is neither trivially easy nor fundamentally unanswerable for the generator, but is challenging yet sufficient for inference and thus provides the strongest learning signal. BAR-RAG trains the selector with reinforcement learning using generator feedback, and adopts a two-stage pipeline that fine-tunes the generator under the induced evidence distribution to mitigate the distribution mismatch between training and inference. Experiments on knowledge-intensive question answering benchmarks show that BAR-RAG consistently improves end-to-end performance under noisy retrieval, achieving an average gain of 10.3 percent over strong RAG and reranking baselines while substantially improving robustness. Code is publicly avaliable at https://github.com/GasolSun36/BAR-RAG.

翻译：检索增强生成（RAG）系统在实际检索噪声下仍然脆弱，即使所需证据出现在前K个结果中。一个关键原因是检索器和重排序器仅针对相关性进行优化，常常选择要么是琐碎、直接揭示答案的段落，要么是缺乏回答问题所需关键信息的证据，而未考虑证据是否适合生成器。我们提出了BAR-RAG，它将重排序器重新定义为一种边界感知的证据选择器，其目标是生成器的“金发姑娘区域”——即对生成器而言既非过于简单也非根本不可回答，而是具有挑战性但足以进行推理的证据，从而提供最强的学习信号。BAR-RAG使用生成器反馈通过强化学习训练选择器，并采用两阶段流水线，在诱导的证据分布下微调生成器，以缓解训练与推理之间的分布不匹配问题。在知识密集型问答基准上的实验表明，BAR-RAG在噪声检索下持续提升端到端性能，相较于强大的RAG和重排序基线平均增益达到10.3%，同时显著提高了鲁棒性。代码公开于https://github.com/GasolSun36/BAR-RAG。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

【博士论文】用于搜索的 Transformer 模型：检索、鲁棒性与拒绝机制

专知会员服务

10+阅读 · 2月8日

【AAAI2026】TruthfulRAG：基于知识图谱解决检索增强生成中的事实层冲突

专知会员服务

22+阅读 · 2025年11月15日

检索增强生成（RAG）技术，261页slides

专知会员服务

42+阅读 · 2025年10月16日

检索增强生成(RAG)与推理的协同作用：一项系统综述

专知会员服务

16+阅读 · 2025年4月27日