REG4Rec: Reasoning-Enhanced Generative Model for Large-Scale Recommendation Systems

Sequential recommendation aims to predict a user's next action in large-scale recommender systems. While traditional methods often suffer from insufficient information interaction, recent generative recommendation models partially address this issue by directly generating item predictions. To better capture user intents, recent studies have introduced a reasoning process into generative recommendation, significantly improving recommendation performance. However, these approaches are constrained by the singularity of item semantic representations, facing challenges such as limited diversity in reasoning pathways and insufficient reliability in the reasoning process. To tackle these issues, we introduce REG4Rec, a reasoning-enhanced generative model that constructs multiple dynamic semantic reasoning paths alongside a self-reflection process, ensuring high-confidence recommendations. Specifically, REG4Rec utilizes an MoE-based parallel quantization codebook (MPQ) to generate multiple unordered semantic tokens for each item, thereby constructing a larger-scale diverse reasoning space. Furthermore, to enhance the reliability of reasoning, we propose a training reasoning enhancement stage, which includes Preference Alignment for Reasoning (PARS) and a Multi-Step Reward Augmentation (MSRA) strategy. PARS uses reward functions tailored for recommendation to enhance reasoning and reflection, while MSRA introduces future multi-step actions to improve overall generalization. During inference, Consistency-Oriented Self-Reflection for Pruning (CORP) is proposed to discard inconsistent reasoning paths, preventing the propagation of erroneous reasoning. Lastly, we develop an efficient offline training strategy for large-scale recommendation. Experiments on real-world datasets and online evaluations show that REG4Rec delivers outstanding performance and substantial practical value.

翻译：序列推荐旨在预测用户在大规模推荐系统中的下一个行为。传统方法常因信息交互不足而受限，而近期的生成式推荐模型通过直接生成物品预测部分解决了这一问题。为更好地捕捉用户意图，近期研究将推理过程引入生成式推荐，显著提升了推荐性能。然而，这些方法受限于物品语义表示的单一性，面临推理路径多样性有限、推理过程可靠性不足等挑战。为解决这些问题，我们提出了REG4Rec，一种推理增强的生成模型，通过构建多条动态语义推理路径并结合自反思过程，确保高置信度的推荐。具体而言，REG4Rec采用基于混合专家（MoE）的并行量化码本（MPQ）为每个物品生成多个无序语义标记，从而构建更大规模的多样化推理空间。此外，为增强推理的可靠性，我们提出了训练阶段的推理增强策略，包括面向推理的偏好对齐（PARS）和多步奖励增强（MSRA）策略。PARS利用针对推荐任务设计的奖励函数来强化推理与反思过程，而MSRA则引入未来多步行为以提升整体泛化能力。在推理阶段，我们提出了面向一致性的自反思剪枝（CORP）方法，以剔除不一致的推理路径，防止错误推理的传播。最后，我们为大规模推荐系统设计了一种高效的离线训练策略。在真实数据集上的实验及在线评估表明，REG4Rec具有优异的性能和显著的实用价值。