OneReason Technical Report

OneRec Team,Biao Yang,Boyang Ding,Chenglong Chu,Dunju Zang,Fei Pan,Han Li,Hao Jiang,Honghui Bao,Huanjie Wang,Jian Liang,Jiangxia Cao,Jiao Ou,Jiaxin Deng,Jinghao Zhang,Kun Gai,Lu Ren,Peiru Du,Pengfei Zheng,Rongzhou Zhang,Ruiming Tang,Shiyao Wang,Siyang Mao,Siyuan Lou,Teng Shi,Wei Yuan,Wenlong Xu,Xingchen Liu,Xingmei Wang,Xinqi Jin,Yan Sun,Yan Wang,Yifei Hu,Yingzhi He,Yufei Ye,Yuhao Wang,Yunhao Zhou,Yuqin Dai,Zhao Liu,Zhipeng Wei,Zhixin Ling,Ziming Li,Zixing Zhang,Ziyuan Liu,An Zhang,Changxin Lao,Chaoyi Ma,Chengru Song,Defu Lian,Fan Yang,Guowang Zhang,Hao Peng,Jiayao Shen,Jie Chen,Jun Xu,Junmin Chen,Kun Zhang,Kuo Cai,Mingxing Wen,Minmao Wang,Minxuan Lv,Qi Zhang,Qiang Luo,Sheng Yu,Shijie Li,Shijie Yi,Shuang Yang,Shugui Liu,Shuni Chen,Tinghai Zhang,Tingting Gao,Xiang Wang,Xiangyu Wu,Xiangyu Zhao,Xiao Lv,Xiaoyou Zhou,Xuming Wang,Yong Du,Zejian Zhang,Zhaojie Liu,Zhiyang Zhang,Zhuang Zhuang,Ziqi Wang,Ziyi Zhao

from arxiv, Work in progress

Generative recommendation models in the OneRec family have been widely deployed in many real-world services, such as short-video, live-streaming, advertising, and e-commerce. However, these generative models can only benefit from the scaling advantage, while their reasoning ability is hard to activate, since we cannot construct meaningful Chain-of-Thought (CoT) sequences consisting of itemic tokens only. Inspired by the success of the reasoning-style ``think before answer'' paradigm in the LLM field, we conduct preliminary studies (i.e., OneRec-Think, OpenOneRec) to explore reasoning capability in generative recommendation. Nevertheless, we notice an unexpected phenomenon: the thinking mode does not show advantages over the non-thinking mode. Drawing insights from recent findings on CoT robustness in multi-modal language models, we argue that effective reasoning in recommendation rests on two factors: perception, the ability to ground itemic tokens in their underlying language semantics, and cognition, the ability to reorganize a user's behavior sequence into coherent latent interest points. We therefore propose OneReason, which includes: (1) strong itemic token perception in pre-training, (2) a three-level cognition-enhanced CoT format for recommendation tasks in SFT, and (3) a specialize-then-unify training recipe in RL to enhance the thinking ability.

翻译：OneRec系列中的生成式推荐模型已广泛应用于短视频、直播、广告及电子商务等多种实际服务中。然而，这类生成模型仅能受益于规模优势，其推理能力难以被激活，原因在于我们无法构建仅由物品标记组成的有意义的思维链序列。受大语言模型领域“先思考再回答”推理范式成功经验的启发，我们开展了初步研究（即OneRec-Think、OpenOneRec），以探索生成式推荐中的推理能力。尽管如此，我们观察到一个意外现象：思考模式相较于非思考模式并未展现出优势。借鉴近期关于多模态语言模型中思维链鲁棒性的研究发现，我们论证了推荐系统中的有效推理取决于两个要素：感知能力——将物品标记与其底层语言语义进行关联的能力；认知能力——将用户行为序列重新组织为连贯的潜在兴趣点的能力。为此，我们提出了OneReason，其包含：（1）预训练阶段的强物品标记感知能力，（2）针对推荐任务在监督微调中采用的三级认知增强思维链格式，（3）强化学习阶段采用“先专业化后统一”的训练策略以增强思考能力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《大模型一体机应用研究报告（2025年）》，48页pdf

专知会员服务

27+阅读 · 2025年11月2日

从技术突破到场景落地：大模型发展图谱与DeepSeek创新应用

专知会员服务

48+阅读 · 2025年4月1日

最全面《DeepSeek R1》技术文章

专知会员服务

90+阅读 · 2025年1月29日

《OpenAI o1大模型》中英文技术报告，44页pdf

专知会员服务

150+阅读 · 2024年9月15日