Generative recommendation has recently attracted widespread attention in industry due to its potential for scaling and stronger model capacity. However, deploying real-time generative recommendation in large-scale advertising requires designs beyond large-language-model (LLM)-style training and serving recipes. We present a production-oriented generative recommender co-designed across architecture, learning, and serving, named GR4AD (Generative Recommendation for ADdvertising). As for tokenization, GR4AD proposes UA-SID (Unified Advertisement Semantic ID) to capture complicated business information. Furthermore, GR4AD introduces LazyAR, a lazy autoregressive decoder that relaxes layer-wise dependencies for short, multi-candidate generation, preserving effectiveness while reducing inference cost, which facilitates scaling under fixed serving budgets. To align optimization with business value, GR4AD employs VSL (Value-Aware Supervised Learning) and proposes RSPO (Ranking-Guided Softmax Preference Optimization), a ranking-aware, list-wise reinforcement learning algorithm that optimizes value-based rewards under list-level metrics for continual online updates. For online inference, we further propose dynamic beam serving, which adapts beam width across generation levels and online load to control compute. Large-scale online A/B tests show up to 4.2% ad revenue improvement over an existing DLRM-based stack, with consistent gains from both model scaling and inference-time scaling. GR4AD has been fully deployed in Kuaishou advertising system with over 400 million users and achieves high-throughput real-time serving.
翻译:生成式推荐因其可扩展性和更强的模型容量,近期在工业界引起了广泛关注。然而,在大规模广告场景中部署实时的生成式推荐,需要超越大语言模型式训练与推理范式的设计方案。本文提出了一种面向生产的生成式推荐器,在架构、学习和服务方面进行了协同设计,名为GR4AD(面向广告的生成式推荐)。在分词方面,GR4AD提出了UA-SID(统一广告语义ID)以捕捉复杂的业务信息。此外,GR4AD引入了LazyAR(懒惰自回归解码器),该解码器放宽了短序列、多候选生成的逐层依赖关系,在保持有效性的同时降低了推理成本,有助于在固定服务预算下实现扩展。为了将优化目标与业务价值对齐,GR4AD采用VSL(价值感知监督学习),并提出了RSPO(排序引导的Softmax偏好优化),这是一种排序感知的、列表级的强化学习算法,在列表级指标下优化基于价值的奖励,以实现持续的在线更新。对于在线推理,我们进一步提出了动态波束服务,该服务根据生成层级和在线负载自适应调整波束宽度以控制计算量。大规模在线A/B测试表明,与现有的基于DLRM的技术栈相比,广告收入提升高达4.2%,模型扩展和推理时扩展均带来了一致的收益。GR4AD已在拥有超过4亿用户的快手广告系统中全面部署,并实现了高吞吐量的实时服务。