Retrieval-augmented generation (RAG) grounds large language models with external evidence, but under a limited context budget, the key challenge is deciding which retrieved passages should be injected. We show that retrieval relevance metrics (e.g., NDCG) correlate weakly with end-to-end QA quality and can even become negatively correlated under multi-passage injection, where redundancy and mild conflicts destabilize generation. We propose \textbf{Information Gain Pruning (IGP)}, a deployment-friendly reranking-and-pruning module that selects evidence using a generator-aligned utility signal and filters weak or harmful passages before truncation, without changing existing budget interfaces. Across five open-domain QA benchmarks and multiple retrievers and generators, IGP consistently improves the quality--cost trade-off. In a representative multi-evidence setting, IGP delivers about +12--20% relative improvement in average F1 while reducing final-stage input tokens by roughly 76--79% compared to retriever-only baselines.
翻译:检索增强生成(RAG)通过外部证据为大型语言模型提供支持,但在有限的上下文预算下,核心挑战在于决定应注入哪些检索到的段落。我们发现,检索相关性指标(如NDCG)与端到端问答质量的相关性较弱,在多段落注入场景下甚至可能呈现负相关,其中冗余和轻微的矛盾会干扰生成的稳定性。为此,我们提出**信息增益剪枝(IGP)**,这是一种易于部署的重排序与剪枝模块,它利用生成器对齐的效用信号来选择证据,并在截断前过滤掉无效或有害的段落,且无需改变现有的预算接口。在五个开放域问答基准测试以及多种检索器和生成器上的实验表明,IGP持续提升了质量与成本的权衡效率。在一个典型的多证据场景中,与仅使用检索器的基线相比,IGP在平均F1分数上实现了约+12%至20%的相对提升,同时将最终阶段的输入词元数量减少了约76%至79%。