Generative Recommendation for Large-Scale Advertising

Ben Xue,Dan Liu,Lixiang Wang,Mingjie Sun,Peng Wang,Pengfei Zhang,Shaoyun Shi,Tianyu Xu,Yunhao Sha,Zhiqiang Liu,Bo Kong,Bo Wang,Hang Yang,Jieting Xue,Junhao Wang,Shengyu Wang,Shuping Hui,Wencai Ye,Xiao Lin,Yongzhi Li,Yuhang Chen,Zhihui Yin,Quan Chen,Shiyang Wen,Wenjin Wu,Han Li,Guorui Zhou,Changcheng Li,Peng Jiang

from arxiv, 13 pages, 6 figures, under review

Generative recommendation has recently attracted widespread attention in industry due to its potential for scaling and stronger model capacity. However, deploying real-time generative recommendation in large-scale advertising requires designs beyond large-language-model (LLM)-style training and serving recipes. We present a production-oriented generative recommender co-designed across architecture, learning, and serving, named GR4AD (Generative Recommendation for ADdvertising). As for tokenization, GR4AD proposes UA-SID (Unified Advertisement Semantic ID) to capture complicated business information. Furthermore, GR4AD introduces LazyAR, a lazy autoregressive decoder that relaxes layer-wise dependencies for short, multi-candidate generation, preserving effectiveness while reducing inference cost, which facilitates scaling under fixed serving budgets. To align optimization with business value, GR4AD employs VSL (Value-Aware Supervised Learning) and proposes RSPO (Ranking-Guided Softmax Preference Optimization), a ranking-aware, list-wise reinforcement learning algorithm that optimizes value-based rewards under list-level metrics for continual online updates. For online inference, we further propose dynamic beam serving, which adapts beam width across generation levels and online load to control compute. Large-scale online A/B tests show up to 4.2% ad revenue improvement over an existing DLRM-based stack, with consistent gains from both model scaling and inference-time scaling. GR4AD has been fully deployed in Kuaishou advertising system with over 400 million users and achieves high-throughput real-time serving.

翻译：生成式推荐因其在扩展性和更强模型能力方面的潜力，近期在工业界引起了广泛关注。然而，在大规模广告系统中部署实时生成式推荐，需要超越大语言模型（LLM）风格的训练和服务方案的设计。我们提出了一种面向生产的生成式推荐系统，在架构、学习和服务层面进行了协同设计，命名为GR4AD（面向广告的生成式推荐）。在标记化方面，GR4AD提出了UA-SID（统一广告语义ID）来捕获复杂的业务信息。此外，GR4AD引入了LazyAR，一种惰性自回归解码器，它放宽了层间依赖性以进行短序列、多候选的生成，在保持有效性的同时降低了推理成本，从而有助于在固定的服务预算下实现扩展。为了使优化与业务价值对齐，GR4AD采用了VSL（价值感知监督学习），并提出了RSPO（排序引导的Softmax偏好优化），这是一种排序感知的列表式强化学习算法，可在列表级指标下优化基于价值的奖励，以支持持续的在线更新。对于在线推理，我们进一步提出了动态束搜索服务，它根据生成层级和在线负载自适应调整束宽以控制计算量。大规模在线A/B测试显示，相较于现有的基于DLRM的推荐栈，广告收入提升了高达4.2%，并且模型扩展和推理时扩展均带来了持续的收益提升。GR4AD已在拥有超过4亿用户的快手广告系统中全面部署，并实现了高吞吐量的实时服务。