While the OneRec series has successfully unified the fragmented recommendation pipeline into an end-to-end generative framework, a significant gap remains between recommendation systems and general intelligence. Constrained by isolated data, they operate as domain specialists-proficient in pattern matching but lacking world knowledge, reasoning capabilities, and instruction following. This limitation is further compounded by the lack of a holistic benchmark to evaluate such integrated capabilities. To address this, our contributions are: 1) RecIF Bench & Open Data: We propose RecIF-Bench, a holistic benchmark covering 8 diverse tasks that thoroughly evaluate capabilities from fundamental prediction to complex reasoning. Concurrently, we release a massive training dataset comprising 96 million interactions from 160,000 users to facilitate reproducible research. 2) Framework & Scaling: To ensure full reproducibility, we open-source our comprehensive training pipeline, encompassing data processing, co-pretraining, and post-training. Leveraging this framework, we demonstrate that recommendation capabilities can scale predictably while mitigating catastrophic forgetting of general knowledge. 3) OneRec-Foundation: We release OneRec Foundation (1.7B and 8B), a family of models establishing new state-of-the-art (SOTA) results across all tasks in RecIF-Bench. Furthermore, when transferred to the Amazon benchmark, our models surpass the strongest baselines with an average 26.8% improvement in Recall@10 across 10 diverse datasets (Figure 1). This work marks a step towards building truly intelligent recommender systems. Nonetheless, realizing this vision presents significant technical and theoretical challenges, highlighting the need for broader research engagement in this promising direction.
翻译:尽管OneRec系列已成功将碎片化的推荐流程统一为端到端的生成式框架,但推荐系统与通用智能之间仍存在显著差距。受限于孤立数据,现有系统仅能作为领域专家——擅长模式匹配,但缺乏世界知识、推理能力及指令遵循能力。这一局限性因缺乏评估此类综合能力的整体基准而进一步加剧。为此,我们的贡献包括:1) RecIF基准与开放数据:提出RecIF-Bench这一涵盖8类多样化任务的整体基准,全面评估从基础预测到复杂推理的能力。同时,我们发布了包含16万用户9600万交互的大规模训练数据集,以促进可复现研究。2) 框架与扩展:为确保完全可复现性,我们开源了涵盖数据处理、协同预训练与后训练的完整训练流程。基于该框架,我们证明了推荐能力可被可预测地扩展,同时缓解通用知识的灾难性遗忘。3) OneRec基础模型:发布了OneRec Foundation(1.7B和8B)模型系列,在RecIF-Bench所有任务中均取得最先进的性能。此外,当迁移至Amazon基准时,我们的模型在10个多样化数据集上的Recall@10平均提升26.8%(图1),超越了现有最强基线。本工作标志着向构建真正智能推荐系统迈出的一步。然而,实现该愿景仍面临重大技术与理论挑战,凸显了在这一前景广阔的方向上需要更广泛的研究投入。