Learning from user interaction history through sequential models has become a cornerstone of large-scale recommender systems. Recent advances in large language models have revealed promising scaling laws, sparking a surge of research into long-sequence modeling and deeper architectures for recommendation tasks. However, many recent approaches rely heavily on cross-attention mechanisms to address the quadratic computational bottleneck in sequential modeling, which can limit the representational power gained from self-attention. We present ULTRA-HSTU, a novel sequential recommendation model developed through end-to-end model and system co-design. By innovating in the design of input sequences, sparse attention mechanisms, and model topology, ULTRA-HSTU achieves substantial improvements in both model quality and efficiency. Comprehensive benchmarking demonstrates that ULTRA-HSTU achieves remarkable scaling efficiency gains -- over 5x faster training scaling and 21x faster inference scaling compared to conventional models -- while delivering superior recommendation quality. Our solution is fully deployed at scale, serving billions of users daily and driving significant 4% to 8% consumption and engagement improvements in real-world production environments.
翻译:通过序列模型从用户交互历史中学习已成为大规模推荐系统的基石。大型语言模型的最新进展揭示了有前景的缩放定律,激发了针对推荐任务的长序列建模和更深层架构的研究热潮。然而,许多近期方法严重依赖交叉注意力机制来解决序列建模中的二次计算瓶颈,这可能限制从自注意力中获得的表征能力。我们提出了ULTRA-HSTU,一种通过端到端模型与系统协同设计开发的新型序列推荐模型。通过在输入序列设计、稀疏注意力机制和模型拓扑结构方面的创新,ULTRA-HSTU在模型质量和效率方面均实现了显著提升。全面的基准测试表明,ULTRA-HSTU取得了卓越的缩放效率增益——与传统模型相比,训练缩放速度提升超过5倍,推理缩放速度提升21倍——同时提供了更优的推荐质量。我们的解决方案已全面大规模部署,每日服务数十亿用户,并在实际生产环境中驱动了显著的4%至8%的消费和参与度提升。