OxygenREC: An Instruction-Following Generative Framework for E-commerce Recommendation

Xuegang Hao,Ming Zhang,Alex Li,Xiangyu Qian,Zhi Ma,Yanlong Zang,Shijie Yang,Zhongxuan Han,Xiaolong Ma,Jinguang Liu,Zhen Li,Zhida Jiang,Shusheng Wang,Ning Tang,Yanchen Qiao,Chenxiang Yang,Chen Sun,Jincheng Yuan,Chunhua Peng,Heng Hu,Peijun Yang,Baopeng Yuan,Caiyun Qiu,Zhaolong Xing,Haofei Yuan,Haipeng Zhang,Yuzhang Guo,Weijie Ding,Jiahua Gao,Hao Huang,Zhen Chen,Tongxuan Liu,Pinghua Gong

from arxiv, 37 pages, 7 figures

Traditional recommendation systems suffer from inconsistency in multi-stage optimization objectives. Generative Recommendation (GR) mitigates them through an end-to-end framework; however, existing methods still rely on matching mechanisms based on inductive patterns. Although responsive, they lack the ability to uncover complex user intents that require deductive reasoning based on world knowledge. Meanwhile, LLMs show strong deep reasoning capabilities, but their latency and computational costs remain challenging for industrial applications. More critically, there are performance bottlenecks in multi-scenario scalability: as shown in Figure 1, existing solutions require independent training and deployment for each scenario, leading to low resource utilization and high maintenance costs-a challenge unaddressed in GR literature. To address these, we present OxygenREC, an industrial recommendation system that leverages Fast-Slow Thinking to deliver deep reasoning with strict latency and multi-scenario requirements of real-world environments. First, we adopt a Fast-Slow Thinking architecture. Slow thinking uses a near-line LLM pipeline to synthesize Contextual Reasoning Instructions, while fast thinking employs a high-efficiency encoder-decoder backbone for real-time generation. Second, to ensure reasoning instructions effectively enhance recommendation generation, we introduce a semantic alignment mechanism with Instruction-Guided Retrieval (IGR) to filter intent-relevant historical behaviors and use a Query-to-Item (Q2I) loss for instruction-item consistency. Finally, to resolve multi-scenario scalability, we transform scenario information into controllable instructions, using unified reward mapping and Soft Adaptive Group Clip Policy Optimization (SA-GCPO) to align policies with diverse business objectives, realizing a train-once-deploy-everywhere paradigm.

翻译：传统推荐系统在多阶段优化目标上存在不一致性。生成式推荐通过端到端框架缓解了这一问题；然而，现有方法仍依赖于基于归纳模式的匹配机制。尽管响应迅速，它们缺乏揭示需要基于世界知识进行演绎推理的复杂用户意图的能力。与此同时，大语言模型展现出强大的深度推理能力，但其延迟和计算成本对工业应用仍具挑战性。更为关键的是，在多场景可扩展性方面存在性能瓶颈：如图1所示，现有解决方案需要为每个场景进行独立的训练和部署，导致资源利用率低且维护成本高——这是生成式推荐文献中尚未解决的挑战。为解决这些问题，我们提出了OxygenREC，一种工业级推荐系统，它利用快慢思考来满足现实环境中对深度推理的严格延迟和多场景需求。首先，我们采用快慢思考架构。慢思考使用近线大语言模型管道来合成上下文推理指令，而快思考则采用高效的编码器-解码器骨干网络进行实时生成。其次，为确保推理指令有效增强推荐生成，我们引入了指令引导检索的语义对齐机制，以过滤与意图相关的历史行为，并使用查询到物品损失来保证指令与物品的一致性。最后，为解决多场景可扩展性问题，我们将场景信息转化为可控指令，利用统一的奖励映射和软自适应分组裁剪策略优化，使策略与多样化的业务目标对齐，实现了“一次训练，随处部署”的范式。