Retrieval-Augmented Generation (RAG) systems remain brittle under realistic retrieval noise, even when the required evidence appears in the top-K results. A key reason is that retrievers and rerankers optimize solely for relevance, often selecting either trivial, answer-revealing passages or evidence that lacks the critical information required to answer the question, without considering whether the evidence is suitable for the generator. We propose BAR-RAG, which reframes the reranker as a boundary-aware evidence selector that targets the generator's Goldilocks Zone -- evidence that is neither trivially easy nor fundamentally unanswerable for the generator, but is challenging yet sufficient for inference and thus provides the strongest learning signal. BAR-RAG trains the selector with reinforcement learning using generator feedback, and adopts a two-stage pipeline that fine-tunes the generator under the induced evidence distribution to mitigate the distribution mismatch between training and inference. Experiments on knowledge-intensive question answering benchmarks show that BAR-RAG consistently improves end-to-end performance under noisy retrieval, achieving an average gain of 10.3 percent over strong RAG and reranking baselines while substantially improving robustness. Code is publicly avaliable at https://github.com/GasolSun36/BAR-RAG.
翻译:检索增强生成(RAG)系统在实际检索噪声下仍然脆弱,即使所需证据出现在前K个结果中。一个关键原因是检索器和重排序器仅针对相关性进行优化,常常选择要么是琐碎、直接揭示答案的段落,要么是缺乏回答问题所需关键信息的证据,而未考虑证据是否适合生成器。我们提出了BAR-RAG,它将重排序器重新定义为一种边界感知的证据选择器,其目标是生成器的“金发姑娘区域”——即对生成器而言既非过于简单也非根本不可回答,而是具有挑战性但足以进行推理的证据,从而提供最强的学习信号。BAR-RAG使用生成器反馈通过强化学习训练选择器,并采用两阶段流水线,在诱导的证据分布下微调生成器,以缓解训练与推理之间的分布不匹配问题。在知识密集型问答基准上的实验表明,BAR-RAG在噪声检索下持续提升端到端性能,相较于强大的RAG和重排序基线平均增益达到10.3%,同时显著提高了鲁棒性。代码公开于https://github.com/GasolSun36/BAR-RAG。