Deploying Semantic ID-based Generative Retrieval for Large-Scale Podcast Discovery at Spotify

Edoardo D'Amico,Marco De Nadai,Praveen Chandar,Divita Vohra,Shawn Lin,Max Lefarov,Paul Gigioli,Gustavo Penha,Ilya Kopysitsky,Ivo Joel Senese,Darren Mei,Francesco Fabbri,Oguz Semerci,Yu Zhao,Vincent Tang,Brian St. Thomas,Alexandra Ranieri,Matthew N. K. Smith,Aaron Bernkopf,Bryan Leung,Ghazal Fazelnia,Mark VanMiddlesworth,Timothy Christopher Heath,Petter Pehrson Skiden,Alice Y. Wang,Doug J. Cole,Andreas Damianou,Maya Hristakeva,Reid Wilbur,Tarun Chillara,Vladan Radosavljevic,Pooja Chitkara,Sainath Adapa,Juan Elenter,Bernd Huber,Jacqueline Wood,Saaketh Vedantam,Jan Stypka,Sandeep Ghael,Martin D. Gould,David Murgatroyd,Yves Raimond,Mounia Lalmas,Paul N. Bennett

Podcast listening is often grounded in a set of favorite shows, while listener intent can evolve over time. This combination of stable preferences and changing intent motivates recommendation approaches that support both familiarity and exploration. Traditional recommender systems typically emphasize long-term interaction patterns, and are less explicitly designed to incorporate rich contextual signals or flexible, intent-aware discovery objectives. In this setting, models that can jointly reason over semantics, context, and user state offer a promising direction. Large Language Models (LLMs) provide strong semantic reasoning and contextual conditioning for discovery-oriented recommendation, but deploying them in production introduces challenges in catalog grounding, user-level personalization, and latency-critical serving. We address these challenges with GLIDE, a production-scale generative recommender for podcast discovery at Spotify. GLIDE formulates recommendation as an instruction-following task over a discretized catalog using Semantic IDs, enabling grounded generation over a large inventory. The model conditions on recent listening history and lightweight user context, while injecting long-term user embeddings as soft prompts to capture stable preferences under strict inference constraints. We evaluate GLIDE using offline retrieval metrics, human judgments, and LLM-based evaluation, and validate its impact through large-scale online A/B testing. Across experiments involving millions of users, GLIDE increases non-habitual podcast streaming on Spotify home surface by up to 5.4% and new-show discovery by up to 14.3%, while meeting production cost and latency constraints.

翻译：播客收听通常建立在用户喜爱的节目集合之上，而听众意图会随时间演变。这种稳定偏好与动态意图的结合，促使推荐方法需兼顾熟悉度与探索性。传统推荐系统通常强调长期交互模式，较少显式设计用于融合丰富的上下文信号或灵活的意图感知发现目标。在此背景下，能够联合推理语义、上下文和用户状态的模型展现出广阔前景。大语言模型为面向发现的推荐提供了强大的语义推理和上下文条件支持，但在生产环境中部署面临目录对齐、用户级个性化以及延迟敏感服务等挑战。我们通过GLIDE系统应对这些挑战——这是Spotify中面向播客发现的生产级生成式推荐系统。GLIDE将推荐任务形式化为对离散化目录（基于语义ID）的指令遵循过程，从而实现在大规模库存上的有依据生成。该模型基于近期收听历史和轻量级用户上下文进行条件建模，并注入长期用户嵌入作为软提示，以在严格的推理约束下捕捉稳定偏好。我们通过离线检索指标、人工评估和基于LLM的评估对GLIDE进行评测，并通过大规模在线A/B测试验证其效果。在涉及数百万用户的实验中，GLIDE使Spotify首页的非习惯性播客流媒体播放量提升最高达5.4%，新节目发现率提升最高达14.3%，同时满足生产成本和延迟约束。

相关内容

Spotify

关注 0

Spotify 是一个起源于瑞典的音乐平台，提供包括四大唱片公司和众多独立厂牌在内，约 3200 万歌曲的流媒体服务。至 2016 年 3月，其全球活跃用户总数约为 9000 万，包括约 3000 万付费用户。该服务目前在安道尔、阿根廷、澳大利亚、比利时、丹麦、德国、西班牙、爱沙尼亚、芬兰、法国、希腊、冰岛、爱尔兰、意大利、拉脱维亚、列支敦士登、立陶宛、卢森堡、马来西亚、摩纳哥、墨西哥、荷兰、新西兰、挪威、波兰、葡萄牙、瑞士、新加坡、瑞典、土耳其、美国、英国、奥地利、台湾和香港运营。

多智能体视频推荐系统：演进历程、范式模式与开放性挑战

专知会员服务

14+阅读 · 4月4日

生成式推荐综述：数据、模型与任务

专知会员服务

19+阅读 · 2025年11月4日

【WWW2025】G-Refer：基于图检索增强的大型语言模型用于可解释推荐

专知会员服务

13+阅读 · 2025年4月8日

【WWW2025】释放大型语言模型在去噪推荐中的强大能力

专知会员服务

13+阅读 · 2025年2月18日