Efficient, Property-Aligned Fan-Out Retrieval via RL-Compiled Diffusion

Many modern retrieval problems are set-valued: given a broad intent, the system must return a collection of results that optimizes higher-order properties (e.g., diversity, coverage, complementarity, coherence) while remaining grounded with respect to a fixed database. Set-valued objectives are typically non-decomposable and are not captured by existing supervised (query, content) datasets which only prioritize top-1 retrieval. Consequently, fan-out retrieval is often employed to generate diverse subqueries to retrieve item sets. While reinforcement learning (RL) can optimize set-level objectives via interaction, deploying an RL-tuned LLM for fan-out retrieval is prohibitively expensive at inference time. Conversely, diffusion-based generative retrieval enables efficient single-pass fan-out in embedding space, but requires objective-aligned training targets. To address these issues, we propose R4T (Retrieve-for-Train), which uses RL once as an objective transducer in a three-step process: (i) train a fan-out LLM with composite set-level rewards, (ii) synthesize objective-consistent training pairs, and (iii) train a lightweight diffusion retriever to model the conditional distribution of set-valued outputs. Across large-scale fashion and music benchmarks consisting of curated item sets, we show that R4T improves retrieval quality relative to strong baselines while reducing query-time fan-out latency by an order of magnitude.

翻译：许多现代检索问题本质上是集合值问题：给定一个宽泛的意图，系统必须返回一个结果集合，该集合在优化高阶属性（例如多样性、覆盖率、互补性、连贯性）的同时，仍需基于固定数据库保持相关性。集合值目标通常不可分解，且未被现有的仅优先考虑Top-1检索的监督式（查询，内容）数据集所涵盖。因此，扇出检索常被用于生成多样化的子查询以检索项目集合。虽然强化学习可以通过交互优化集合层面的目标，但在推理时部署一个经过强化学习调优的大型语言模型进行扇出检索成本过高。相反，基于扩散的生成式检索能够在嵌入空间实现高效的单次扇出，但需要与目标对齐的训练目标。为解决这些问题，我们提出了R4T（为训练而检索），该方法将强化学习作为目标转换器，采用三步流程：（i）使用复合集合级奖励训练一个扇出大型语言模型，（ii）合成与目标一致的训练对，（iii）训练一个轻量级扩散检索器以建模集合值输出的条件分布。在由精选项目集合组成的大规模时尚和音乐基准测试中，我们证明R4T相对于强基线提高了检索质量，同时将查询时扇出延迟降低了一个数量级。

相关内容

属性

关注 2

一个具体事物，总是有许许多多的性质与关系，我们把一个事物的性质与关系，都叫作事物的属性。事物与属性是不可分的，事物都是有属性的事物，属性也都是事物的属性。一个事物与另一个事物的相同或相异，也就是一个事物的属性与另一事物的属性的相同或相异。由于事物属性的相同或相异，客观世界中就形成了许多不同的事物类。具有相同属性的事物就形成一类，具有不同属性的事物就分别地形成不同的类。

[ICML 2026] SOLAR：自监督联合学习实现对称多模态检索

专知会员服务

8+阅读 · 5月18日

大型语言模型对齐技术综述：RLHF、RLAIF、PPO、DPO 等

专知会员服务

55+阅读 · 2024年7月24日

《图强化学习在组合优化中的应用》综述

专知会员服务

60+阅读 · 2024年4月10日

【ACL2022】一种基于三阶张量同构的高效实体对齐译码算法, An Effective and Efficient Entity Alignment Decoding Algorithm via Third-Order Tensor Isomorphism

专知会员服务

13+阅读 · 2022年3月24日