OptiSet：面向检索增强生成的集合选择与排序统一优化框架 (OptiSet: Unified Optimizing Set Selection and Ranking for Retrieval-Augmented Generation)

Retrieval-Augmented Generation (RAG) improves generation quality by incorporating evidence retrieved from large external corpora. However, most existing methods rely on statically selecting top-k passages based on individual relevance, which fails to exploit combinatorial gains among passages and often introduces substantial redundancy. To address this limitation, we propose OptiSet, a set-centric framework that unifies set selection and set-level ranking for RAG. OptiSet adopts an "Expand-then-Refine" paradigm: it first expands a query into multiple perspectives to enable a diverse candidate pool and then refines the candidate pool via re-selection to form a compact evidence set. We then devise a self-synthesis strategy without strong LLM supervision to derive preference labels from the set conditional utility changes of the generator, thereby identifying complementary and redundant evidence. Finally, we introduce a set-list wise training strategy that jointly optimizes set selection and set-level ranking, enabling the model to favor compact, high-gain evidence sets. Extensive experiments demonstrate that OptiSet improves performance on complex combinatorial problems and makes generation more efficient. The source code is publicly available.

翻译：检索增强生成（RAG）通过从大型外部语料库中检索证据来提升生成质量。然而，现有方法大多基于单篇相关性静态选择前k个段落，这既无法利用段落间的组合增益，又常常引入大量冗余。为克服这一局限，我们提出OptiSet——一个面向RAG的集合中心化框架，将集合选择与集合级排序进行统一优化。OptiSet采用“扩展-精炼”范式：首先将查询扩展为多视角表述以构建多样化候选池，随后通过重选择机制精炼候选池，形成紧凑的证据集合。我们进一步设计了无需强LLM监督的自合成策略，通过生成器在集合条件效用变化中推导偏好标签，从而识别互补性与冗余性证据。最后，我们提出集合列表序训练策略，联合优化集合选择与集合级排序，使模型倾向于选择紧凑且高增益的证据集合。大量实验表明，OptiSet在复杂组合问题上提升了性能，并使生成过程更高效。源代码已公开。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

【AAAI2026】TruthfulRAG：基于知识图谱解决检索增强生成中的事实层冲突

专知会员服务

20+阅读 · 2025年11月15日

检索增强生成（RAG）技术，261页slides

专知会员服务

41+阅读 · 2025年10月16日

【新书】Essential GraphRAG: 知识图谱增强的RAG

专知会员服务

32+阅读 · 2025年7月17日

【SIGIR2025教程】动态与参数化检索增强生成

专知会员服务

16+阅读 · 2025年7月14日