Generative Recommendation for Large-Scale Advertising

Ben Xue,Dan Liu,Lixiang Wang,Mingjie Sun,Peng Wang,Pengfei Zhang,Shaoyun Shi,Tianyu Xu,Yunhao Sha,Zhiqiang Liu,Bo Kong,Bo Wang,Hang Yang,Jieting Xue,Junhao Wang,Shengyu Wang,Shuping Hui,Wencai Ye,Xiao Lin,Yongzhi Li,Yuhang Chen,Zhihui Yin,Quan Chen,Shiyang Wen,Wenjin Wu,Han Li,Guorui Zhou,Changcheng Li,Peng Jiang,Kun Gai

from arxiv, 13 pages, 6 figures, under review

Generative recommendation has recently attracted widespread attention in industry due to its potential for scaling and stronger model capacity. However, deploying real-time generative recommendation in large-scale advertising requires designs beyond large-language-model (LLM)-style training and serving recipes. We present a production-oriented generative recommender co-designed across architecture, learning, and serving, named GR4AD (Generative Recommendation for ADdvertising). As for tokenization, GR4AD proposes UA-SID (Unified Advertisement Semantic ID) to capture complicated business information. Furthermore, GR4AD introduces LazyAR, a lazy autoregressive decoder that relaxes layer-wise dependencies for short, multi-candidate generation, preserving effectiveness while reducing inference cost, which facilitates scaling under fixed serving budgets. To align optimization with business value, GR4AD employs VSL (Value-Aware Supervised Learning) and proposes RSPO (Ranking-Guided Softmax Preference Optimization), a ranking-aware, list-wise reinforcement learning algorithm that optimizes value-based rewards under list-level metrics for continual online updates. For online inference, we further propose dynamic beam serving, which adapts beam width across generation levels and online load to control compute. Large-scale online A/B tests show up to 4.2% ad revenue improvement over an existing DLRM-based stack, with consistent gains from both model scaling and inference-time scaling. GR4AD has been fully deployed in Kuaishou advertising system with over 400 million users and achieves high-throughput real-time serving.

翻译：生成式推荐因其可扩展性和更强的模型容量，近期在工业界引起了广泛关注。然而，在大规模广告场景中部署实时的生成式推荐，需要超越大语言模型式训练与推理范式的设计方案。本文提出了一种面向生产的生成式推荐器，在架构、学习和服务方面进行了协同设计，名为GR4AD（面向广告的生成式推荐）。在分词方面，GR4AD提出了UA-SID（统一广告语义ID）以捕捉复杂的业务信息。此外，GR4AD引入了LazyAR（懒惰自回归解码器），该解码器放宽了短序列、多候选生成的逐层依赖关系，在保持有效性的同时降低了推理成本，有助于在固定服务预算下实现扩展。为了将优化目标与业务价值对齐，GR4AD采用VSL（价值感知监督学习），并提出了RSPO（排序引导的Softmax偏好优化），这是一种排序感知的、列表级的强化学习算法，在列表级指标下优化基于价值的奖励，以实现持续的在线更新。对于在线推理，我们进一步提出了动态波束服务，该服务根据生成层级和在线负载自适应调整波束宽度以控制计算量。大规模在线A/B测试表明，与现有的基于DLRM的技术栈相比，广告收入提升高达4.2%，模型扩展和推理时扩展均带来了一致的收益。GR4AD已在拥有超过4亿用户的快手广告系统中全面部署，并实现了高吞吐量的实时服务。