Scaling deep learning recommendation models is an effective way to improve model expressiveness. Existing approaches often incur substantial computational overhead, making them difficult to deploy in large-scale industrial systems under strict latency constraints. Recent sparse activation scaling methods, such as Sparse Mixture-of-Experts, reduce computation by activating only a subset of parameters, but still suffer from high memory access costs and limited personalization capacity due to the large size and small number of experts. To address these challenges, we propose MSN, a memory-based sparse activation scaling framework for recommendation models. MSN dynamically retrieves personalized representations from a large parameterized memory and integrates them into downstream feature interaction modules via a memory gating mechanism, enabling fine-grained personalization with low computational overhead. To enable further expansion of the memory capacity while keeping both computational and memory access costs under control, MSN adopts a Product-Key Memory (PKM) mechanism, which factorizes the memory retrieval complexity from linear time to sub-linear complexity. In addition, normalization and over-parameterization techniques are introduced to maintain balanced memory utilization and prevent memory retrieval collapse. We further design customized Sparse-Gather operator and adopt the AirTopK operator to improve training and inference efficiency in industrial settings. Extensive experiments demonstrate that MSN consistently improves recommendation performance while maintaining high efficiency. Moreover, MSN has been successfully deployed in the Douyin Search Ranking System, achieving significant gains over deployed state-of-the-art models in both offline evaluation metrics and large-scale online A/B test.
翻译:扩展深度学习推荐模型是提升模型表达能力的一种有效途径。现有方法通常会产生巨大的计算开销,使其难以在具有严格延迟约束的大规模工业系统中部署。近期的稀疏激活扩展方法,例如稀疏专家混合模型,通过仅激活部分参数来减少计算量,但由于专家规模庞大且数量有限,仍面临高内存访问成本和有限的个性化能力。为应对这些挑战,我们提出MSN,一种用于推荐模型的基于记忆的稀疏激活扩展框架。MSN从大型参数化记忆中动态检索个性化表征,并通过记忆门控机制将其集成到下游特征交互模块中,从而以较低的计算开销实现细粒度个性化。为了在控制计算和内存访问成本的同时进一步扩展记忆容量,MSN采用产品键记忆机制,将记忆检索复杂度从线性时间分解为亚线性复杂度。此外,引入了归一化和过参数化技术以保持记忆利用的平衡并防止记忆检索崩溃。我们进一步设计了定制的Sparse-Gather算子,并采用AirTopK算子以提高工业环境中的训练和推理效率。大量实验表明,MSN在保持高效率的同时持续提升推荐性能。此外,MSN已成功部署于抖音搜索排序系统,在离线评估指标和大规模在线A/B测试中均显著超越了已部署的先进模型。