Large-scale live-streaming recommendation requires precise modeling of non-stationary content semantics under strict real-time serving constraints. In industrial deployment, two common approaches exhibit fundamental limitations: discrete semantic abstractions sacrifice descriptive precision through clustering, while dense multimodal embeddings are extracted independently and remain weakly aligned with ranking optimization, limiting fine-grained content-aware ranking. To address these limitations, we propose \textbf{SARM}, an end-to-end ranking architecture that integrates natural-language semantic anchors directly into ranking optimization, enabling fine-grained author representations conditioned on multimodal content. Each semantic anchor is represented as learnable text tokens jointly optimized with ranking features, allowing the model to adapt content descriptions to ranking objectives. A lightweight dual-token gated design captures domain-specific live-streaming semantics, while an asymmetric deployment strategy preserves low-latency online training and serving. Extensive offline evaluation and large-scale A/B tests show consistent improvements over production baselines. SARM is fully deployed and serves over 400 million users daily.
翻译:大规模直播推荐需要在严格的实时服务约束下对非平稳内容语义进行精确建模。在工业部署中,两种常见方法存在根本性局限:离散语义抽象方法通过聚类牺牲了描述精度,而稠密多模态嵌入则被独立提取且与排序优化弱对齐,限制了细粒度内容感知排序能力。为应对这些局限,我们提出\textbf{SARM}——一种将自然语言语义锚点直接融入排序优化的端到端排序架构,能够基于多模态内容生成细粒度的创作者表征。每个语义锚点被表示为可与排序特征联合优化的可学习文本标记,使模型能够根据排序目标自适应调整内容描述。轻量级双门控标记设计捕获领域特定的直播语义,非对称部署策略则保障了低延迟在线训练与服务。大量离线评估及大规模A/B测试表明,该方法相对生产基线取得持续改进。SARM已全面部署,每日服务超4亿用户。