GPR: Towards a Generative Pre-trained One-Model Paradigm for Large-Scale Advertising Recommendation

Jun Zhang,Yi Li,Yue Liu,Changping Wang,Yuan Wang,Yuling Xiong,Xun Liu,Haiyang Wu,Qian Li,Enming Zhang,Jiawei Sun,Xin Xu,Zishuai Zhang,Ruoran Liu,Suyuan Huang,Zhaoxin Zhang,Zhengkai Guo,Shuojin Yang,Meng-Hao Guo,Huan Yu,Jie Jiang,Shi-Min Hu

from arxiv, 12 pages, 5 figures

As an intelligent infrastructure connecting users with commercial content, advertising recommendation systems play a central role in information flow and value creation within the digital economy. However, existing multi-stage advertising recommendation systems suffer from objective misalignment and error propagation, making it difficult to achieve global optimality, while unified generative recommendation models still struggle to meet the demands of practical industrial applications. To address these issues, we propose GPR (Generative Pre-trained Recommender), the first one-model framework that redefines advertising recommendation as an end-to-end generative task, replacing the traditional cascading paradigm with a unified generative approach. To realize GPR, we introduce three key innovations spanning unified representation, network architecture, and training strategy. First, we design a unified input schema and tokenization method tailored to advertising scenarios, mapping both ads and organic content into a shared multi-level semantic ID space, thereby enhancing semantic alignment and modeling consistency across heterogeneous data. Second, we develop the Heterogeneous Hierarchical Decoder (HHD), a dual-decoder architecture that decouples user intent modeling from ad generation, achieving a balance between training efficiency and inference flexibility while maintaining strong modeling capacity. Finally, we propose a multi-stage joint training strategy that integrates Multi-Token Prediction (MTP), Value-Aware Fine-Tuning and the Hierarchy Enhanced Policy Optimization (HEPO) algorithm, forming a complete generative recommendation pipeline that unifies interest modeling, value alignment, and policy optimization. GPR has been fully deployed in the Tencent Weixin Channels advertising system, delivering significant improvements in key business metrics including GMV and CTCVR.

翻译：作为连接用户与商业内容的智能基础设施，广告推荐系统在数字经济的信息流和价值创造中发挥着核心作用。然而，现有的多阶段广告推荐系统存在目标不一致和误差传播问题，难以实现全局最优，而统一的生成式推荐模型仍难以满足实际工业应用的需求。为解决这些问题，我们提出了GPR（生成式预训练推荐器），这是首个将广告推荐重新定义为端到端生成任务、以统一生成方法取代传统级联范式的单模型框架。为实现GPR，我们在统一表征、网络架构和训练策略三个方面引入了关键创新。首先，我们设计了针对广告场景的统一输入模式和分词方法，将广告和自然内容映射到共享的多层级语义ID空间，从而增强异构数据间的语义对齐和建模一致性。其次，我们开发了异构分层解码器（HHD），这是一种将用户意图建模与广告生成解耦的双解码器架构，在保持强大建模能力的同时，实现了训练效率与推理灵活性之间的平衡。最后，我们提出了一种多阶段联合训练策略，该策略整合了多令牌预测（MTP）、价值感知微调以及层级增强策略优化（HEPO）算法，形成了一个统一兴趣建模、价值对齐和策略优化的完整生成式推荐流程。GPR已在腾讯微信视频号广告系统中全面部署，在GMV和CTCVR等关键业务指标上取得了显著提升。