Climber-Pilot: A Non-Myopic Generative Recommendation Model Towards Better Instruction-Following

Generative retrieval has emerged as a promising paradigm in recommender systems, offering superior sequence modeling capabilities over traditional dual-tower architectures. However, in large-scale industrial scenarios, such models often suffer from inherent myopia: due to single-step inference and strict latency constraints, they tend to collapse diverse user intents into locally optimal predictions, failing to capture long-horizon and multi-item consumption patterns. Moreover, real-world retrieval systems must follow explicit retrieval instructions, such as category-level control and policy constraints. Incorporating such instruction-following behavior into generative retrieval remains challenging, as existing conditioning or post-hoc filtering approaches often compromise relevance or efficiency. In this work, we present Climber-Pilot, a unified generative retrieval framework to address both limitations. First, we introduce Time-Aware Multi-Item Prediction (TAMIP), a novel training paradigm designed to mitigate inherent myopia in generative retrieval. By distilling long-horizon, multi-item foresight into model parameters through time-aware masking, TAMIP alleviates locally optimal predictions while preserving efficient single-step inference. Second, to support flexible instruction-following retrieval, we propose Condition-Guided Sparse Attention (CGSA), which incorporates business constraints directly into the generative process via sparse attention, without introducing additional inference steps. Extensive offline experiments and online A/B testing at NetEase Cloud Music, one of the largest music streaming platforms, demonstrate that Climber-Pilot significantly outperforms state-of-the-art baselines, achieving a 4.24\% lift of the core business metric.

翻译：生成式检索已成为推荐系统中一种前景广阔的范式，相比传统的双塔架构，其提供了更优的序列建模能力。然而，在大规模工业场景中，此类模型常受固有的短视性困扰：由于单步推理和严格的延迟约束，它们倾向于将多样化的用户意图坍缩为局部最优预测，无法捕捉长周期和多物品的消费模式。此外，现实世界的检索系统必须遵循明确的检索指令，例如类别级控制和策略约束。将此类指令遵循行为融入生成式检索仍然具有挑战性，因为现有的条件化或事后过滤方法通常会损害相关性或效率。在本工作中，我们提出了Climber-Pilot，一个统一的生成式检索框架，以同时解决这两个局限。首先，我们引入了时间感知多物品预测（TAMIP），这是一种旨在缓解生成式检索中固有短视性的新型训练范式。通过时间感知掩码将长周期、多物品的前瞻性知识蒸馏到模型参数中，TAMIP在保持高效单步推理的同时，缓解了局部最优预测问题。其次，为支持灵活的指令遵循检索，我们提出了条件引导稀疏注意力（CGSA），它通过稀疏注意力将业务约束直接整合到生成过程中，而无需引入额外的推理步骤。在网易云音乐（最大的音乐流媒体平台之一）上进行的广泛离线实验和在线A/B测试表明，Climber-Pilot显著优于最先进的基线模型，实现了核心业务指标4.24%的提升。