LASAR: Latent Adaptive Semantic Aligned Reasoning for Generative Recommendation

Large Language Models (LLMs) have demonstrated powerful reasoning capabilities through Chain-of-Thought (CoT) in various tasks, yet the inefficiency of token-by-token generation hinders real-world deployment in latency-sensitive recommender systems. Latent reasoning has emerged as an effective paradigm in LLMs, performing multi-step inference in a continuous hidden-state space to achieve stronger reasoning at lower cost. However, this paradigm remains underexplored in mainstream generative recommendation. Adapting it reveals three unique challenges: (1) the gap between prior-less Semantic ID (SID) symbols and continuous latent reasoning - SIDs lack pre-trained semantics, hindering joint optimization; (2) representation drift due to a lack of reasoning chain supervision; and (3) the suboptimality of applying a globally fixed reasoning depth. To address these, we propose LASAR (Latent Adaptive Semantic Aligned Reasoning), an SFT-then-RL framework. First, we bridge this gap via two-stage training: Stage 1 grounds SID semantics before Stage 2 introduces latent reasoning, ensuring efficient convergence. Second, we mitigate representation drift through explicit CoT semantic alignment. Step-wise bidirectional KL divergence constrains the latent reasoning trajectory using hidden-state anchors extracted from CoT text, while a Policy Head predicts per-sample reasoning depth. Third, during the GRPO-based RL phase, terminal-only KL alignment accommodates variable-length reasoning, and REINFORCE optimizes the Policy Head to dynamically allocate steps. This nearly halves the average latent step count while simultaneously improving recommendation quality. Experiments on three real-world datasets demonstrate that LASAR outperforms all baselines. It adds marginal inference latency and is roughly 20 times faster than generating explicit CoT text.

翻译：大语言模型通过思维链（CoT）在各类任务中展现出强大的推理能力，但逐token生成的低效性阻碍了其在延迟敏感型推荐系统中的实际部署。潜在推理作为一种新兴范式，通过连续隐状态空间中的多步推理，以更低成本实现更高性能的推理。然而，该范式在主流生成式推荐中仍未得到充分探索。将其适配至推荐场景需解决三个独特挑战：（1）缺乏先验语义的语义ID符号与连续潜在推理之间的鸿沟——语义ID缺少预训练语义，阻碍联合优化；（2）因缺乏推理链监督导致的表征漂移；（3）全局固定推理深度的次优性。为此，我们提出LASAR（潜在自适应语义对齐推理）——一个先SFT后RL的框架。首先，通过两阶段训练弥合鸿沟：第一阶段构建语义ID语义基础，第二阶段引入潜在推理，确保高效收敛。其次，通过显式CoT语义对齐缓解表征漂移：利用从CoT文本提取的隐状态锚点，通过逐步骤双向KL散度约束潜在推理轨迹；同时采用策略头预测样本级推理深度。第三，在基于GRPO的强化学习阶段，仅末端KL对齐适配可变长度推理，并通过REINFORCE优化策略头以实现动态步数分配。该方法在保持推荐质量提升的同时，将平均潜在推理步数减少近半。三个真实数据集上的实验表明，LASAR优于所有基线方法，其推理延迟增量极小，且速度约为生成显式CoT文本的20倍。