In the RAG paradigm, document ranking determines the evidence available to downstream generators. Through controlled analysis, we identify two phenomena underexplored by existing rankers: (i) downstream response quality depends not only on relevance but also on the composition and ordering of selected documents, and (ii) such preferences differ systematically across generators. However, existing rankers are trained purely on query--document relevance, leaving both phenomena unmodeled. To close this gap, we construct \textbf{PRISM}, a bilingual preference-aligned dataset built through a four-stage pipeline that compresses the combinatorial subset-and-ordering space by roughly four orders of magnitude and produces response-quality preference supervision conditioned on seven downstream generators. On a 13k-query subset of PRISM, we train \textbf{Rank4Gen}, a generator-aware ranker that performs joint document set selection and ordering. Experiments on five challenging RAG benchmarks show that Rank4Gen improves downstream QA quality on most evaluated generators, with per-generator F1 gains of up to $+2.08$ over the strongest set-selection baseline. Code is available at https://github.com/JOHNNY-fans/Rank4Gen.
翻译:在RAG范式中,文档排序决定了下游生成器可利用的证据。通过受控分析,我们发现了现有排序器未充分探索的两种现象:(i)下游响应质量不仅取决于相关性,还取决于所选文档的构成与顺序;(ii)此类偏好随生成器的不同而存在系统性差异。然而,现有排序器仅基于查询-文档相关性进行训练,两种现象均未被建模。为填补这一空白,我们构建了\textbf{PRISM}——一个双语言偏好对齐数据集,该数据集通过四级流水线构建,将组合子集与排序空间压缩约四个数量级,并在七种下游生成器的条件下生成基于响应质量的偏好监督信号。基于PRISM的13,000查询子集,我们训练了\textbf{Rank4Gen}——一个执行联合文档集合选择与排序的生成器感知排序器。在五个具有挑战性的RAG基准测试上的实验表明,Rank4Gen在大多数评估的生成器上提升了下游问答质量,与最强集合选择基线相比,每个生成器的F1分数提升最高可达$+2.08$。代码已开源至https://github.com/JOHNNY-fans/Rank4Gen。