SMES: Towards Scalable Multi-Task Recommendation via Expert Sparsity

Industrial recommender systems typically rely on multi-task learning to estimate diverse user feedback signals and aggregate them for ranking. Recent advances in model scaling have shown promising gains in recommendation. However, naively increasing model capacity imposes prohibitive online inference costs and often yields diminishing returns for sparse tasks with skewed label distributions. This mismatch between uniform parameter scaling and heterogeneous task capacity demands poses a fundamental challenge for scalable multi-task recommendation. In this work, we investigate parameter sparsification as a principled scaling paradigm and identify two critical obstacles when applying sparse Mixture-of-Experts (MoE) to multi-task recommendation: exploded expert activation that undermines instance-level sparsity and expert load skew caused by independent task-wise routing. To address these challenges, we propose SMES, a scalable sparse MoE framework with progressive expert routing. SMES decomposes expert activation into a task-shared expert subset jointly selected across tasks and task-adaptive private experts, explicitly bounding per-instance expert execution while preserving task-specific capacity. In addition, SMES introduces a global multi-gate load-balancing regularizer that stabilizes training by regulating aggregated expert utilization across all tasks. SMES has been deployed in Kuaishou large-scale short-video services, supporting over 400 million daily active users. Extensive online experiments demonstrate stable improvements, with GAUC gain of 0.29% and a 0.31% uplift in user watch time.

翻译：工业推荐系统通常依赖多任务学习来估计多样化的用户反馈信号，并将其聚合用于排序。模型规模扩展的最新进展在推荐领域显示出可观的性能提升。然而，简单地增加模型容量会带来极高的在线推理成本，并且对于标签分布偏斜的稀疏任务往往产生收益递减。均匀参数扩展与异构任务容量需求之间的不匹配，构成了可扩展多任务推荐的根本性挑战。本研究将参数稀疏化作为一种原则性的扩展范式进行探索，并识别出将稀疏混合专家模型应用于多任务推荐时的两个关键障碍：破坏实例级稀疏性的专家激活爆炸，以及由独立任务路由引起的专家负载倾斜。为应对这些挑战，我们提出SMES——一种具有渐进式专家路由的可扩展稀疏混合专家框架。SMES将专家激活分解为跨任务联合选择的任务共享专家子集和任务自适应的私有专家，在保持任务特定容量的同时显式约束每个实例的专家执行数量。此外，SMES引入了一个全局多门控负载均衡正则化器，通过调节所有任务的聚合专家利用率来稳定训练。SMES已在快手大规模短视频服务中部署，支持超过4亿日活跃用户。大量在线实验证明了其稳定的改进效果，GAUC提升0.29%，用户观看时长增加0.31%。