Generative recommendation has recently attracted widespread attention in industry due to its potential for scaling and stronger model capacity. However, deploying real-time generative recommendation in large-scale advertising requires designs beyond large-language-model (LLM)-style training and serving recipes. We present a production-oriented generative recommender co-designed across architecture, learning, and serving, named GR4AD (Generative Recommendation for ADdvertising). As for tokenization, GR4AD proposes UA-SID (Unified Advertisement Semantic ID) to capture complicated business information. Furthermore, GR4AD introduces LazyAR, a lazy autoregressive decoder that relaxes layer-wise dependencies for short, multi-candidate generation, preserving effectiveness while reducing inference cost, which facilitates scaling under fixed serving budgets. To align optimization with business value, GR4AD employs VSL (Value-Aware Supervised Learning) and proposes RSPO (Ranking-Guided Softmax Preference Optimization), a ranking-aware, list-wise reinforcement learning algorithm that optimizes value-based rewards under list-level metrics for continual online updates. For online inference, we further propose dynamic beam serving, which adapts beam width across generation levels and online load to control compute. Large-scale online A/B tests show up to 4.2% ad revenue improvement over an existing DLRM-based stack, with consistent gains from both model scaling and inference-time scaling. GR4AD has been fully deployed in Kuaishou advertising system with over 400 million users and achieves high-throughput real-time serving.
翻译:生成式推荐因其在扩展性和更强模型能力方面的潜力,近期在工业界引起了广泛关注。然而,在大规模广告系统中部署实时生成式推荐,需要超越大语言模型(LLM)风格的训练和服务方案的设计。我们提出了一种面向生产的生成式推荐系统,在架构、学习和服务层面进行了协同设计,命名为GR4AD(面向广告的生成式推荐)。在标记化方面,GR4AD提出了UA-SID(统一广告语义ID)来捕获复杂的业务信息。此外,GR4AD引入了LazyAR,一种惰性自回归解码器,它放宽了层间依赖性以进行短序列、多候选的生成,在保持有效性的同时降低了推理成本,从而有助于在固定的服务预算下实现扩展。为了使优化与业务价值对齐,GR4AD采用了VSL(价值感知监督学习),并提出了RSPO(排序引导的Softmax偏好优化),这是一种排序感知的列表式强化学习算法,可在列表级指标下优化基于价值的奖励,以支持持续的在线更新。对于在线推理,我们进一步提出了动态束搜索服务,它根据生成层级和在线负载自适应调整束宽以控制计算量。大规模在线A/B测试显示,相较于现有的基于DLRM的推荐栈,广告收入提升了高达4.2%,并且模型扩展和推理时扩展均带来了持续的收益提升。GR4AD已在拥有超过4亿用户的快手广告系统中全面部署,并实现了高吞吐量的实时服务。