Most conventional Retrieval-Augmented Generation (RAG) pipelines rely on relevance-based retrieval, which often misaligns with utility -- that is, whether the retrieved passages actually improve the quality of the generated text specific to a downstream task such as question answering or query-based summarization. The limitations of existing utility-driven retrieval approaches for RAG are that, firstly, they are resource-intensive typically requiring query encoding, and that secondly, they do not involve listwise ranking loss during training. The latter limitation is particularly critical, as the relative order between documents directly affects generation in RAG. To address this gap, we propose Lightweight Utility-driven Reranking for Efficient RAG (LURE-RAG), a framework that augments any black-box retriever with an efficient LambdaMART-based reranker. Unlike prior methods, LURE-RAG trains the reranker with a listwise ranking loss guided by LLM utility, thereby directly optimizing the ordering of retrieved documents. Experiments on two standard datasets demonstrate that LURE-RAG achieves competitive performance, reaching 97-98% of the state-of-the-art dense neural baseline, while remaining efficient in both training and inference. Moreover, its dense variant, UR-RAG, significantly outperforms the best existing baseline by up to 3%.
翻译:大多数传统的检索增强生成(RAG)流程依赖于基于相关性的检索,这常常与“效用”不一致——即检索到的段落是否真正提升了针对特定下游任务(如问答或基于查询的摘要)所生成文本的质量。现有面向RAG的效用驱动检索方法存在以下局限:首先,它们通常需要查询编码,资源消耗大;其次,训练过程中未采用列表式排序损失。后一局限尤为关键,因为文档间的相对顺序直接影响RAG的生成效果。为弥补这一不足,我们提出了面向高效RAG的轻量级效用驱动重排序(LURE-RAG),该框架通过一个基于LambdaMART的高效重排序器来增强任何黑盒检索器。与先前方法不同,LURE-RAG使用由大语言模型(LLM)效用指导的列表式排序损失来训练重排序器,从而直接优化检索文档的排序。在两个标准数据集上的实验表明,LURE-RAG取得了具有竞争力的性能,达到了当前最先进的密集神经基线性能的97-98%,同时在训练和推理阶段均保持高效。此外,其密集变体UR-RAG显著优于现有最佳基线,性能提升高达3%。