Recommender systems (RecSys) are increasingly emphasizing scaling, leveraging larger architectures and more interaction data to improve personalization. Yet, despite the optimizer's pivotal role in training, modern RecSys pipelines almost universally default to Adam/AdamW, with limited scrutiny of whether these choices are truly optimal for recommendation. In this work, we revisit optimizer design for scalable recommendation and introduce MuonRec, the first framework that brings the recently proposed Muon optimizer to RecSys training. Muon performs orthogonalized momentum updates for 2D weight matrices via Newton-Schulz iteration, promoting diverse update directions and improving optimization efficiency. We develop an open-source training recipe for recommendation models and evaluate it across both traditional sequential recommenders and modern generative recommenders. Extensive experiments demonstrate that MuonRec reduces converged training steps by an average of 32.4\% while simultaneously improving final ranking quality. Specifically, MuonRec yields consistent relative gains in NDCG@10, averaging 12.6\% across all settings, with particularly pronounced improvements in generative recommendation models. These results consistently outperform strong Adam/AdamW baselines, positioning Muon as a promising new optimizer standard for RecSys training. Our code is available.
翻译:推荐系统日益重视规模化,通过采用更大规模的架构和更多的交互数据来提升个性化效果。然而,尽管优化器在训练中起着关键作用,现代推荐系统流程几乎普遍默认使用Adam/AdamW,而对这些选择是否真正适用于推荐任务的审视却十分有限。在本工作中,我们重新审视了可扩展推荐中的优化器设计,并引入了MuonRec——首个将近期提出的Muon优化器应用于推荐系统训练的框架。Muon通过牛顿-舒尔茨迭代对二维权重矩阵执行正交化动量更新,从而促进更新方向的多样性并提升优化效率。我们为推荐模型开发了一套开源训练方案,并在传统序列推荐器与现代生成式推荐器上进行了全面评估。大量实验表明,MuonRec在平均减少32.4%收敛训练步数的同时,持续提升了最终排序质量。具体而言,MuonRec在NDCG@10指标上实现了平均12.6%的相对增益,且在生成式推荐模型中改善尤为显著。这些结果持续优于强大的Adam/AdamW基线,确立了Muon作为推荐系统训练中极具前景的新型优化器标准。我们的代码已开源。