OpenOneRec Technical Report

Guorui Zhou,Honghui Bao,Jiaming Huang,Jiaxin Deng,Jinghao Zhang,Junda She,Kuo Cai,Lejian Ren,Lu Ren,Qiang Luo,Qianqian Wang,Qigen Hu,Rongzhou Zhang,Ruiming Tang,Shiyao Wang,Wuchao Li,Xiangyu Wu,Xinchen Luo,Xingmei Wang,Yifei Hu,Yunfan Wu,Zhanyu Liu,Zhiyang Zhang,Zixing Zhang,Bo Chen,Bin Wen,Chaoyi Ma,Chengru Song,Chenglong Chu,Defu Lian,Fan Yang,Feng Jiang,Hongtao Cheng,Huanjie Wang,Kun Gai,Pengfei Zheng,Qiang Wang,Rui Huang,Siyang Mao,Tingting Gao,Wei Yuan,Yan Wang,Yang Zhou,Yi Su,Zexuan Cheng,Zhixin Ling,Ziming Li

While the OneRec series has successfully unified the fragmented recommendation pipeline into an end-to-end generative framework, a significant gap remains between recommendation systems and general intelligence. Constrained by isolated data, they operate as domain specialists-proficient in pattern matching but lacking world knowledge, reasoning capabilities, and instruction following. This limitation is further compounded by the lack of a holistic benchmark to evaluate such integrated capabilities. To address this, our contributions are: 1) RecIF Bench & Open Data: We propose RecIF-Bench, a holistic benchmark covering 8 diverse tasks that thoroughly evaluate capabilities from fundamental prediction to complex reasoning. Concurrently, we release a massive training dataset comprising 96 million interactions from 160,000 users to facilitate reproducible research. 2) Framework & Scaling: To ensure full reproducibility, we open-source our comprehensive training pipeline, encompassing data processing, co-pretraining, and post-training. Leveraging this framework, we demonstrate that recommendation capabilities can scale predictably while mitigating catastrophic forgetting of general knowledge. 3) OneRec-Foundation: We release OneRec Foundation (1.7B and 8B), a family of models establishing new state-of-the-art (SOTA) results across all tasks in RecIF-Bench. Furthermore, when transferred to the Amazon benchmark, our models surpass the strongest baselines with an average 26.8% improvement in Recall@10 across 10 diverse datasets (Figure 1). This work marks a step towards building truly intelligent recommender systems. Nonetheless, realizing this vision presents significant technical and theoretical challenges, highlighting the need for broader research engagement in this promising direction.

翻译：尽管OneRec系列成功地将碎片化的推荐流程统一为端到端的生成框架，但推荐系统与通用智能之间仍存在显著差距。受限于孤立数据，现有系统仅作为领域专家运行——擅长模式匹配，但缺乏世界知识、推理能力和指令遵循能力。这一局限性因缺乏评估此类综合能力的整体基准而进一步加剧。为此，我们的贡献包括：1) RecIF基准与开放数据：我们提出RecIF-Bench，这是一个涵盖8个多样化任务的整体基准，全面评估从基础预测到复杂推理的能力。同时，我们发布了包含16万用户9600万交互的大规模训练数据集，以促进可复现研究。2) 框架与扩展：为确保完全可复现性，我们开源了完整的训练流程，涵盖数据处理、协同预训练和后训练。利用该框架，我们证明了推荐能力可实现可预测的扩展，同时缓解通用知识的灾难性遗忘。3) OneRec基础模型：我们发布了OneRec Foundation（1.7B和8B）模型系列，在RecIF-Bench的所有任务中均创造了新的最优性能（SOTA）。此外，当迁移至Amazon基准时，我们的模型在10个不同数据集上的Recall@10平均提升26.8%，超越了现有最强基线（图1）。这项工作标志着向构建真正智能推荐系统迈出了一步。然而，实现这一愿景仍面临重大技术与理论挑战，凸显了在这一前景广阔的方向上需要更广泛的研究投入。