Mainstream ranking approaches typically follow a Generator-Evaluator two-stage paradigm, where a generator produces candidate lists and an evaluator selects the best one. Recent work has attempted to enhance performance by expanding the number of candidate lists, for example, through multi-generator settings. However, ranking involves selecting a recommendation list from a combinatorially large space. Simply enlarging the candidate set remains ineffective, and performance gains quickly saturate. At the same time, recent advances in large recommendation models have shown that end-to-end one-stage models can achieve promising performance with the expectation of scaling laws. Motivated by this, we revisit ranking from a generator-only one-stage perspective. We theoretically prove that, for any (finite Multi-)Generator-Evaluator model, there always exists a generator-only model that achieves strictly smaller approximation error to the optimal ranking policy, while also enjoying scaling laws as its size increases. Building on this result, we derive an evidence upper bound of the one-stage optimization objective, from which we find that one can leverage a reward model trained on real user feedback to construct a reference policy in a group-relative manner. This reference policy serves as a practical surrogate of the optimal policy, enabling effective training of a large generator-only ranker. Based on these insights, we propose GoalRank, a generator-only ranking framework. Extensive offline experiments on public benchmarks and large-scale online A/B tests demonstrate that GoalRank consistently outperforms state-of-the-art methods.
翻译:主流的排序方法通常遵循生成器-评估器两阶段范式,其中生成器生成候选列表,评估器从中选择最优列表。近期研究试图通过扩大候选列表的数量来提升性能,例如采用多生成器设置。然而,排序任务涉及从组合爆炸的巨大空间中选择一个推荐列表。仅仅扩大候选集效果有限,性能提升很快达到饱和。与此同时,大规模推荐模型的最新进展表明,端到端的单阶段模型有望通过规模定律实现优异的性能。受此启发,我们从仅含生成器的单阶段视角重新审视排序问题。我们从理论上证明,对于任意(有限多)生成器-评估器模型,总存在一个仅含生成器的模型,其对最优排序策略的近似误差严格更小,并且随着模型规模增大同样遵循规模定律。基于这一结果,我们推导出单阶段优化目标的证据上界,并发现可以利用在真实用户反馈上训练得到的奖励模型,以群体相对的方式构建一个参考策略。该参考策略作为最优策略的实用替代,能够有效训练大规模仅生成器排序器。基于这些发现,我们提出了GoalRank——一个仅含生成器的排序框架。在公开基准上的大量离线实验以及大规模在线A/B测试表明,GoalRank持续优于现有最先进方法。