S$^2$GR: Stepwise Semantic-Guided Reasoning in Latent Space for Generative Recommendation

Generative Recommendation (GR) has emerged as a transformative paradigm with its end-to-end generation advantages. However, existing GR methods primarily focus on direct Semantic ID (SID) generation from interaction sequences, failing to activate deeper reasoning capabilities analogous to those in large language models and thus limiting performance potential. We identify two critical limitations in current reasoning-enhanced GR approaches: (1) Strict sequential separation between reasoning and generation steps creates imbalanced computational focus across hierarchical SID codes, degrading quality for SID codes; (2) Generated reasoning vectors lack interpretable semantics, while reasoning paths suffer from unverifiable supervision. In this paper, we propose stepwise semantic-guided reasoning in latent space (S$^2$GR), a novel reasoning enhanced GR framework. First, we establish a robust semantic foundation via codebook optimization, integrating item co-occurrence relationship to capture behavioral patterns, and load balancing and uniformity objectives that maximize codebook utilization while reinforcing coarse-to-fine semantic hierarchies. Our core innovation introduces the stepwise reasoning mechanism inserting thinking tokens before each SID generation step, where each token explicitly represents coarse-grained semantics supervised via contrastive learning against ground-truth codebook cluster distributions ensuring physically grounded reasoning paths and balanced computational focus across all SID codes. Extensive experiments demonstrate the superiority of S$^2$GR, and online A/B test confirms efficacy on large-scale industrial short video platform.

翻译：生成式推荐凭借其端到端的生成优势，已成为一种变革性范式。然而，现有的生成式推荐方法主要关注从交互序列中直接生成语义ID，未能激活类似于大语言模型的深层推理能力，从而限制了性能潜力。我们指出了当前推理增强型生成式推荐方法的两个关键局限：（1）推理与生成步骤之间的严格顺序分离导致对分层语义ID码的计算关注度不平衡，降低了语义ID码的质量；（2）生成的推理向量缺乏可解释的语义，同时推理路径缺乏可验证的监督。本文提出了一种新颖的推理增强型生成式推荐框架——潜在空间中基于逐步语义引导的推理。首先，我们通过码本优化建立坚实的语义基础，该优化整合了物品共现关系以捕捉行为模式，并引入了负载均衡与均匀性目标，在最大化码本利用率的同时强化了从粗到细的语义层次结构。我们的核心创新在于引入了逐步推理机制，即在每个语义ID生成步骤前插入思考标记，其中每个标记明确表示粗粒度语义，并通过对比学习进行监督，其监督信号来自真实码本聚类分布，从而确保了推理路径的物理基础性以及对所有语义ID码的平衡计算关注。大量实验证明了S$^2$GR的优越性，在线A/B测试也证实了其在大型工业短视频平台上的有效性。