Live-streaming recommender system serves as critical infrastructure that bridges the patterns of real-time interactions between users and authors. Similar to traditional industrial recommender systems, live-streaming recommendation also relies on cascade architectures to support large-scale concurrency. Recent advances in generative recommendation unify the multi-stage recommendation process with Transformer-based architectures, offering improved scalability and higher computational efficiency. However, the inherent complexity of live-streaming prevents the direct transfer of these methods to live-streaming scenario, where continuously evolving content, limited lifecycles, strict real-time constraints, and heterogeneous multi-objectives introduce unique challenges that invalidate static tokenization and conventional model framework. To address these issues, we propose OneLive, a dynamically unified generative recommendation framework tailored for live-streaming scenario. OneLive integrates four key components: (i) A Dynamic Tokenizer that continuously encodes evolving real-time live content fused with behavior signal through residual quantization; (ii) A Time-Aware Gated Attention mechanism that explicitly models temporal dynamics for timely decision making; (iii) An efficient decoder-only generative architecture enhanced with Sequential MTP and QK Norm for stable training and accelerated inference; (iv) A Unified Multi-Objective Alignment Framework reinforces policy optimization for personalized preferences.
翻译:直播推荐系统作为关键基础设施,连接着用户与主播之间的实时交互模式。与传统工业推荐系统类似,直播推荐同样依赖级联架构以支持大规模并发。生成式推荐的最新进展通过基于Transformer的架构统一了多阶段推荐流程,提供了更优的可扩展性与更高的计算效率。然而,直播场景固有的复杂性阻碍了这些方法直接迁移至该领域:持续演进的内容、有限的生命周期、严格的实时约束以及异构多目标带来了独特挑战,使得静态标记化与常规模型框架不再适用。为解决这些问题,我们提出了OneLive——一个专为直播场景设计的动态统一生成式推荐框架。OneLive集成了四个核心组件:(i)动态标记器,通过残差量化持续编码融合行为信号的实时演进直播内容;(ii)时间感知门控注意力机制,显式建模时序动态以实现及时决策;(iii)采用序列化MTP与QK归一化增强的高效仅解码器生成架构,用于稳定训练与加速推理;(iv)统一多目标对齐框架,强化针对个性化偏好的策略优化。