In recent years, the scaling laws of recommendation models have attracted increasing attention, which govern the relationship between performance and parameters/FLOPs of recommenders. Currently, there are three mainstream architectures for achieving scaling in recommendation models, namely attention-based, TokenMixer-based, and factorization-machine-based methods, which exhibit fundamental differences in both design philosophy and architectural structure. In this paper, we propose a unified scaling architecture for recommendation systems, namely \textbf{UniMixer}, to improve scaling efficiency and establish a unified theoretical framework that unifies the mainstream scaling blocks. By transforming the rule-based TokenMixer to an equivalent parameterized structure, we construct a generalized parameterized feature mixing module that allows the token mixing patterns to be optimized and learned during model training. Meanwhile, the generalized parameterized token mixing removes the constraint in TokenMixer that requires the number of heads to be equal to the number of tokens. Furthermore, we establish a unified scaling module design framework for recommender systems, which bridges the connections among attention-based, TokenMixer-based, and factorization-machine-based methods. To further boost scaling ROI, a lightweight UniMixing module is designed, \textbf{UniMixing-Lite}, which further compresses the model parameters and computational cost while significantly improve the model performance. The scaling curves are shown in the following figure. Extensive offline and online experiments are conducted to verify the superior scaling abilities of \textbf{UniMixer}.
翻译:近年来,推荐模型的缩放定律引起了越来越多的关注,该定律描述了推荐系统性能与参数/计算量之间的关联。目前,实现推荐模型缩放的主流架构主要有三种,即基于注意力、基于TokenMixer和基于因子分解机的方法,它们在设计理念和架构结构上存在根本性差异。本文提出了一种用于推荐系统的统一缩放架构,命名为\textbf{UniMixer},旨在提升缩放效率并建立一个统一的理论框架,以整合主流的缩放模块。通过将基于规则的TokenMixer转化为等价的参数化结构,我们构建了一个通用的参数化特征混合模块,使得令牌混合模式能够在模型训练过程中进行优化和学习。同时,通用参数化令牌混合消除了TokenMixer中要求注意力头数量与令牌数量相等的约束。此外,我们建立了一个统一的推荐系统缩放模块设计框架,桥接了基于注意力、基于TokenMixer和基于因子分解机方法之间的联系。为了进一步提升缩放的投资回报率,我们设计了一个轻量化的UniMixing模块,即\textbf{UniMixing-Lite},该模块在显著提升模型性能的同时,进一步压缩了模型参数和计算成本。缩放曲线如下图所示。通过大量的离线与在线实验验证了\textbf{UniMixer}卓越的缩放能力。