GoalRank: Group-Relative Optimization for a Large Ranking Model

Mainstream ranking approaches typically follow a Generator-Evaluator two-stage paradigm, where a generator produces candidate lists and an evaluator selects the best one. Recent work has attempted to enhance performance by expanding the number of candidate lists, for example, through multi-generator settings. However, ranking involves selecting a recommendation list from a combinatorially large space. Simply enlarging the candidate set remains ineffective, and performance gains quickly saturate. At the same time, recent advances in large recommendation models have shown that end-to-end one-stage models can achieve promising performance with the expectation of scaling laws. Motivated by this, we revisit ranking from a generator-only one-stage perspective. We theoretically prove that, for any (finite Multi-)Generator-Evaluator model, there always exists a generator-only model that achieves strictly smaller approximation error to the optimal ranking policy, while also enjoying scaling laws as its size increases. Building on this result, we derive an evidence upper bound of the one-stage optimization objective, from which we find that one can leverage a reward model trained on real user feedback to construct a reference policy in a group-relative manner. This reference policy serves as a practical surrogate of the optimal policy, enabling effective training of a large generator-only ranker. Based on these insights, we propose GoalRank, a generator-only ranking framework. Extensive offline experiments on public benchmarks and large-scale online A/B tests demonstrate that GoalRank consistently outperforms state-of-the-art methods.

翻译：主流的排序方法通常遵循生成器-评估器两阶段范式，其中生成器生成候选列表，评估器从中选择最优列表。近期研究试图通过扩大候选列表的数量来提升性能，例如采用多生成器设置。然而，排序任务涉及从组合爆炸的巨大空间中选择一个推荐列表。仅仅扩大候选集效果有限，性能提升很快达到饱和。与此同时，大规模推荐模型的最新进展表明，端到端的单阶段模型有望通过规模定律实现优异的性能。受此启发，我们从仅含生成器的单阶段视角重新审视排序问题。我们从理论上证明，对于任意（有限多）生成器-评估器模型，总存在一个仅含生成器的模型，其对最优排序策略的近似误差严格更小，并且随着模型规模增大同样遵循规模定律。基于这一结果，我们推导出单阶段优化目标的证据上界，并发现可以利用在真实用户反馈上训练得到的奖励模型，以群体相对的方式构建一个参考策略。该参考策略作为最优策略的实用替代，能够有效训练大规模仅生成器排序器。基于这些发现，我们提出了GoalRank——一个仅含生成器的排序框架。在公开基准上的大量离线实验以及大规模在线A/B测试表明，GoalRank持续优于现有最先进方法。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

【WWW2026】用于多模态推荐的基础模型个性化参数高效微调研究

专知会员服务

5+阅读 · 2月20日

多样化偏好优化

专知会员服务

12+阅读 · 2025年2月3日

大语言模型在序列推荐中的应用

专知会员服务

19+阅读 · 2024年11月12日

推荐系统融合排序的多目标寻优技术

专知会员服务

18+阅读 · 2024年8月17日