Podcast listening is often grounded in a set of favorite shows, while listener intent can evolve over time. This combination of stable preferences and changing intent motivates recommendation approaches that support both familiarity and exploration. Traditional recommender systems typically emphasize long-term interaction patterns, and are less explicitly designed to incorporate rich contextual signals or flexible, intent-aware discovery objectives. In this setting, models that can jointly reason over semantics, context, and user state offer a promising direction. Large Language Models (LLMs) provide strong semantic reasoning and contextual conditioning for discovery-oriented recommendation, but deploying them in production introduces challenges in catalog grounding, user-level personalization, and latency-critical serving. We address these challenges with GLIDE, a production-scale generative recommender for podcast discovery at Spotify. GLIDE formulates recommendation as an instruction-following task over a discretized catalog using Semantic IDs, enabling grounded generation over a large inventory. The model conditions on recent listening history and lightweight user context, while injecting long-term user embeddings as soft prompts to capture stable preferences under strict inference constraints. We evaluate GLIDE using offline retrieval metrics, human judgments, and LLM-based evaluation, and validate its impact through large-scale online A/B testing. Across experiments involving millions of users, GLIDE increases non-habitual podcast streaming on Spotify home surface by up to 5.4% and new-show discovery by up to 14.3%, while meeting production cost and latency constraints.
翻译:播客收听通常建立在用户喜爱的节目集合之上,而听众意图会随时间演变。这种稳定偏好与动态意图的结合,促使推荐方法需兼顾熟悉度与探索性。传统推荐系统通常强调长期交互模式,较少显式设计用于融合丰富的上下文信号或灵活的意图感知发现目标。在此背景下,能够联合推理语义、上下文和用户状态的模型展现出广阔前景。大语言模型为面向发现的推荐提供了强大的语义推理和上下文条件支持,但在生产环境中部署面临目录对齐、用户级个性化以及延迟敏感服务等挑战。我们通过GLIDE系统应对这些挑战——这是Spotify中面向播客发现的生产级生成式推荐系统。GLIDE将推荐任务形式化为对离散化目录(基于语义ID)的指令遵循过程,从而实现在大规模库存上的有依据生成。该模型基于近期收听历史和轻量级用户上下文进行条件建模,并注入长期用户嵌入作为软提示,以在严格的推理约束下捕捉稳定偏好。我们通过离线检索指标、人工评估和基于LLM的评估对GLIDE进行评测,并通过大规模在线A/B测试验证其效果。在涉及数百万用户的实验中,GLIDE使Spotify首页的非习惯性播客流媒体播放量提升最高达5.4%,新节目发现率提升最高达14.3%,同时满足生产成本和延迟约束。