The Generator-Evaluator (G-E) framework, i.e., evaluating K sequences from a generator and selecting the top-ranked one according to evaluator scores, is a foundational paradigm in tasks such as Recommender Systems (RecSys) and Natural Language Processing (NLP). Traditional evaluators process sequences independently, suffering from two major limitations: (1) lack of explicit cross-sequence comparison, leading to suboptimal accuracy; (2) poor parallelization with linear complexity of O(K), resulting in inefficient resource utilization and negative impact on both throughput and latency. To address these challenges, we propose FlashEvaluator, which enables cross-sequence token information sharing and processes all sequences in a single forward pass. This yields sublinear computational complexity that improves the system's efficiency and supports direct inter-sequence comparisons that improve selection accuracy. The paper also provides theoretical proofs and extensive experiments on recommendation and NLP tasks, demonstrating clear advantages over conventional methods. Notably, FlashEvaluator has been deployed in online recommender system of Kuaishou, delivering substantial and sustained revenue gains in practice.
翻译:生成器-评估器(G-E)框架,即从生成器中评估K个序列并根据评估器分数选择排名最高的序列,是推荐系统(RecSys)和自然语言处理(NLP)等任务中的基础范式。传统的评估器独立处理序列,存在两个主要局限:(1)缺乏显式的跨序列比较,导致准确率欠佳;(2)并行化能力差,具有O(K)的线性复杂度,导致资源利用率低下,并对吞吐量和延迟产生负面影响。为应对这些挑战,我们提出了FlashEvaluator,它实现了跨序列的令牌信息共享,并在单次前向传播中处理所有序列。这产生了次线性的计算复杂度,提升了系统效率,并支持直接的序列间比较,从而提高了选择准确率。本文还提供了理论证明以及在推荐和NLP任务上的大量实验,证明了其相对于传统方法的明显优势。值得注意的是,FlashEvaluator已在快手的在线推荐系统中部署,在实践中带来了显著且持续的收入增长。