Rank4Gen: RAG-Preference-Aligned Document Set Selection and Ranking

In the RAG paradigm, the information retrieval module provides context for generators by retrieving and ranking multiple documents to support the aggregation of evidence. However, existing ranking models are primarily optimized for query--document relevance, which often misaligns with generators' preferences for evidence selection and citation, limiting their impact on response quality. Moreover, most approaches do not account for preference differences across generators, resulting in unstable cross-generator performance. We propose \textbf{Rank4Gen}, a generator-aware ranker for RAG that targets the goal of \emph{Ranking for Generators}. Rank4Gen introduces two key preference modeling strategies: (1) \textbf{From Ranking Relevance to Response Quality}, which optimizes ranking with respect to downstream response quality rather than query--document relevance; and (2) \textbf{Generator-Specific Preference Modeling}, which conditions a single ranker on different generators to capture their distinct ranking preferences. To enable such modeling, we construct \textbf{PRISM}, a dataset built from multiple open-source corpora and diverse downstream generators. Experiments on five challenging and recent RAG benchmarks demonstrate that Rank4Gen achieves strong and competitive performance for complex evidence composition in RAG.

翻译：在RAG范式中，信息检索模块通过检索并排序多篇文档来提供上下文，以支持证据聚合。然而，现有排序模型主要针对查询-文档相关性进行优化，这往往与生成器在证据选择和引用方面的偏好不一致，从而限制了其对响应质量的提升作用。此外，大多数方法未考虑不同生成器间的偏好差异，导致跨生成器性能不稳定。我们提出\textbf{Rank4Gen}，一种面向生成器的RAG排序器，其目标在于实现“为生成器排序”。Rank4Gen引入两种关键偏好建模策略：（1）\textbf{从排序相关性到响应质量}，即依据下游响应质量而非查询-文档相关性进行排序优化；（2）\textbf{生成器特定偏好建模}，使单一排序器能够适配不同生成器以捕捉其独特的排序偏好。为支持此类建模，我们构建了\textbf{PRISM}数据集，该数据集基于多个开源语料库及多样化的下游生成器构建。在五个具有挑战性的近期RAG基准测试上的实验表明，Rank4Gen在RAG复杂证据组合任务中取得了优异且具竞争力的性能。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

【博士论文】用于搜索的 Transformer 模型：检索、鲁棒性与拒绝机制

专知会员服务

10+阅读 · 2月8日

检索增强生成（RAG）技术，261页slides

专知会员服务

42+阅读 · 2025年10月16日

【新书】Essential GraphRAG: 知识图谱增强的RAG

专知会员服务

35+阅读 · 2025年7月17日