SSRLive: Live Streaming Recommendation with Dynamic Semantic ID

Live streaming has emerged as one of the fastest-growing forms of online media, enabling instant content broadcasting and real-time engagement between users and streamers. Despite the effectiveness of existing recommendation algorithms in this domain, they often suffer from limited utilization of computational resources, with low FLOPs that hinder further performance enhancement. Generative recommendation techniques, which have gained traction in various industrial tasks, offer a promising avenue for improving live streaming recommendations. However, directly applying generative methods to live streaming is non-trivial due to two major challenges: (1) static semantic IDs (SIDs) cannot reflect the rapidly changing nature of live room content; and (2) generative pipelines generally do not incorporate user--streamer interaction signals (e.g., likes, orders), which are critical for modeling user intent toward both the streamer and showcased products. To address these challenges, we introduce SSRLive: Dynamic Semantic ID-guided Streaming Recommendation for Live platforms. The proposed framework integrates a generative module and a discriminative module in a unified architecture. The generative component employs an encoder-decoder design to produce both static and dynamic SIDs, enabling timely representation of live room content while leveraging multimodal information. The discriminative component refines task-specific representations by combining SIDs with user features, augments them with user-streamer interaction data, and performs multi-task predictions. Online A/B tests in real-world deployment demonstrate tangible benefits: watch time (+3.38%), GMV (+0.72%), follower growth (+3.12%), and interaction volume (+2.92%). These improvements highlight the effectiveness and business value of SSRLive, which is now fully deployed, serving hundreds of millions of active users.

翻译：直播已成为增长最快的在线媒体形式之一，支持用户与主播之间的即时内容广播和实时互动。尽管现有推荐算法在该领域效果显著，但它们往往受限于计算资源的低效利用——较低的浮点运算次数阻碍了性能的进一步提升。生成式推荐技术已在多种工业任务中广泛应用，为改进直播推荐提供了有前景的方向。然而，直接将生成式方法应用于直播领域面临两大挑战：（1）静态语义ID无法反映直播间内容的快速变化特性；（2）生成式流水线通常不包含用户-主播交互信号（如点赞、下单），而这类信号对建模用户对主播及展示商品的意图至关重要。为解决上述问题，我们提出SSRLive：面向直播平台的动态语义ID引导的流式推荐。该框架在统一架构中整合了生成模块与判别模块：生成组件采用编码器-解码器设计，在利用多模态信息的同时生成静态与动态语义ID，从而实时表征直播间内容；判别组件通过将语义ID与用户特征结合来精炼任务特异性表征，融入用户-主播交互数据并执行多任务预测。实际部署中的在线A/B测试显示了显著收益：观看时长（+3.38%）、商品交易总额（+0.72%）、关注增长（+3.12%）及互动量（+2.92%）。这些改进验证了SSRLive的有效性与商业价值，该系统现已全面部署，为数亿活跃用户提供服务。