Recent breakthroughs in generative AI have transformed recommender systems through end-to-end generation. OneRec reformulates recommendation as an autoregressive generation task, achieving high Model FLOPs Utilization. While OneRec-V1 has shown significant empirical success in real-world deployment, two critical challenges hinder its scalability and performance: (1) inefficient computational allocation where 97.66% of resources are consumed by sequence encoding rather than generation, and (2) limitations in reinforcement learning relying solely on reward models. To address these challenges, we propose OneRec-V2, featuring: (1) Lazy Decoder-Only Architecture: Eliminates encoder bottlenecks, reducing total computation by 94% and training resources by 90%, enabling successful scaling to 8B parameters. (2) Preference Alignment with Real-World User Interactions: Incorporates Duration-Aware Reward Shaping and Adaptive Ratio Clipping to better align with user preferences using real-world feedback. Extensive A/B tests on Kuaishou demonstrate OneRec-V2's effectiveness, improving App Stay Time by 0.467%/0.741% while balancing multi-objective recommendations. This work advances generative recommendation scalability and alignment with real-world feedback, representing a step forward in the development of end-to-end recommender systems.
翻译:近期生成式人工智能的突破性进展通过端到端生成技术重塑了推荐系统。OneRec将推荐任务重新定义为自回归生成问题,实现了较高的模型浮点运算利用率。尽管OneRec-V1在实际部署中展现出显著的实证成功,但其可扩展性和性能仍面临两大关键挑战:(1) 计算资源分配效率低下,97.66%的资源消耗在序列编码而非生成过程;(2) 强化学习仅依赖奖励模型的局限性。为应对这些挑战,我们提出OneRec-V2,其核心创新包括:(1) 惰性仅解码器架构:消除编码器瓶颈,将总计算量减少94%,训练资源降低90%,成功实现80亿参数规模的扩展。(2) 基于真实用户交互的偏好对齐:通过时长感知奖励塑形与自适应比率裁剪技术,利用真实世界反馈更好地对齐用户偏好。在快手平台进行的广泛A/B测试表明,OneRec-V2在平衡多目标推荐的同时,将应用停留时间提升0.467%/0.741%。本工作推动了生成式推荐系统的可扩展性及其与真实世界反馈的对齐能力,标志着端到端推荐系统发展的重要进展。