Generative recommendation models in the OneRec family have been widely deployed in many real-world services, such as short-video, live-streaming, advertising, and e-commerce. However, these generative models can only benefit from the scaling advantage, while their reasoning ability is hard to activate, since we cannot construct meaningful Chain-of-Thought (CoT) sequences consisting of itemic tokens only. Inspired by the success of the reasoning-style ``think before answer'' paradigm in the LLM field, we conduct preliminary studies (i.e., OneRec-Think, OpenOneRec) to explore reasoning capability in generative recommendation. Nevertheless, we notice an unexpected phenomenon: the thinking mode does not show advantages over the non-thinking mode. Drawing insights from recent findings on CoT robustness in multi-modal language models, we argue that effective reasoning in recommendation rests on two factors: perception, the ability to ground itemic tokens in their underlying language semantics, and cognition, the ability to reorganize a user's behavior sequence into coherent latent interest points. We therefore propose OneReason, which includes: (1) strong itemic token perception in pre-training, (2) a three-level cognition-enhanced CoT format for recommendation tasks in SFT, and (3) a specialize-then-unify training recipe in RL to enhance the thinking ability.
翻译:OneRec系列中的生成式推荐模型已广泛应用于短视频、直播、广告及电子商务等多种实际服务中。然而,这类生成模型仅能受益于规模优势,其推理能力难以被激活,原因在于我们无法构建仅由物品标记组成的有意义的思维链序列。受大语言模型领域“先思考再回答”推理范式成功经验的启发,我们开展了初步研究(即OneRec-Think、OpenOneRec),以探索生成式推荐中的推理能力。尽管如此,我们观察到一个意外现象:思考模式相较于非思考模式并未展现出优势。借鉴近期关于多模态语言模型中思维链鲁棒性的研究发现,我们论证了推荐系统中的有效推理取决于两个要素:感知能力——将物品标记与其底层语言语义进行关联的能力;认知能力——将用户行为序列重新组织为连贯的潜在兴趣点的能力。为此,我们提出了OneReason,其包含:(1)预训练阶段的强物品标记感知能力,(2)针对推荐任务在监督微调中采用的三级认知增强思维链格式,(3)强化学习阶段采用“先专业化后统一”的训练策略以增强思考能力。