S$^2$GR: Stepwise Semantic-Guided Reasoning in Latent Space for Generative Recommendation

Generative Recommendation (GR) has emerged as a transformative paradigm with its end-to-end generation advantages. However, existing GR methods primarily focus on direct Semantic ID (SID) generation from interaction sequences, failing to activate deeper reasoning capabilities analogous to those in large language models and thus limiting performance potential. We identify two critical limitations in current reasoning-enhanced GR approaches: (1) Strict sequential separation between reasoning and generation steps creates imbalanced computational focus across hierarchical SID codes, degrading quality for SID codes; (2) Generated reasoning vectors lack interpretable semantics, while reasoning paths suffer from unverifiable supervision. In this paper, we propose stepwise semantic-guided reasoning in latent space (S$^2$GR), a novel reasoning enhanced GR framework. First, we establish a robust semantic foundation via codebook optimization, integrating item co-occurrence relationship to capture behavioral patterns, and load balancing and uniformity objectives that maximize codebook utilization while reinforcing coarse-to-fine semantic hierarchies. Our core innovation introduces the stepwise reasoning mechanism inserting thinking tokens before each SID generation step, where each token explicitly represents coarse-grained semantics supervised via contrastive learning against ground-truth codebook cluster distributions ensuring physically grounded reasoning paths and balanced computational focus across all SID codes. Extensive experiments demonstrate the superiority of S$^2$GR, and online A/B test confirms efficacy on large-scale industrial short video platform.

翻译：生成式推荐（GR）凭借其端到端生成的优势，已成为一种变革性范式。然而，现有的GR方法主要关注从交互序列中直接生成语义ID（SID），未能激活类似于大语言模型中的深层推理能力，从而限制了性能潜力。我们指出了当前推理增强型GR方法的两个关键局限：（1）推理与生成步骤之间严格的顺序分离导致对分层SID编码的计算关注度不平衡，从而降低了SID编码的质量；（2）生成的推理向量缺乏可解释的语义，同时推理路径缺乏可验证的监督。本文提出了一种新颖的推理增强型GR框架——潜在空间中的逐步语义引导推理（S$^2$GR）。首先，我们通过码本优化建立坚实的语义基础，整合物品共现关系以捕获行为模式，并引入负载均衡和均匀性目标，在最大化码本利用率的同时强化从粗到细的语义层次结构。我们的核心创新在于引入了逐步推理机制，即在每个SID生成步骤前插入思考标记，其中每个标记通过对比学习（以真实码本聚类分布为监督）显式地表示粗粒度语义，从而确保推理路径具有物理基础，并且对所有SID编码的计算关注度保持平衡。大量实验证明了S$^2$GR的优越性，在线A/B测试也证实了其在大型工业短视频平台上的有效性。