UniMixer: A Unified Architecture for Scaling Laws in Recommendation Systems

In recent years, the scaling laws of recommendation models have attracted increasing attention, which govern the relationship between performance and parameters/FLOPs of recommenders. Currently, there are three mainstream architectures for achieving scaling in recommendation models, namely attention-based, TokenMixer-based, and factorization-machine-based methods, which exhibit fundamental differences in both design philosophy and architectural structure. In this paper, we propose a unified scaling architecture for recommendation systems, namely \textbf{UniMixer}, to improve scaling efficiency and establish a unified theoretical framework that unifies the mainstream scaling blocks. By transforming the rule-based TokenMixer to an equivalent parameterized structure, we construct a generalized parameterized feature mixing module that allows the token mixing patterns to be optimized and learned during model training. Meanwhile, the generalized parameterized token mixing removes the constraint in TokenMixer that requires the number of heads to be equal to the number of tokens. Furthermore, we establish a unified scaling module design framework for recommender systems, which bridges the connections among attention-based, TokenMixer-based, and factorization-machine-based methods. To further boost scaling ROI, a lightweight UniMixing module is designed, \textbf{UniMixing-Lite}, which further compresses the model parameters and computational cost while significantly improve the model performance. The scaling curves are shown in the following figure. Extensive offline and online experiments are conducted to verify the superior scaling abilities of \textbf{UniMixer}.

翻译：近年来，推荐模型的缩放定律日益受到关注，该定律决定了推荐系统性能与参数/FLOPs之间的关系。当前，实现推荐模型缩放的主流架构主要有三种，即基于注意力机制、基于TokenMixer和基于因子分解机的方法，这些方法在设计理念和架构结构上均存在根本性差异。本文提出了一种统一的推荐系统缩放架构——\textbf{UniMixer}，旨在提升缩放效率，并建立统一的理论框架以整合主流缩放模块。通过将基于规则的TokenMixer转化为等价的参数化结构，我们构建了一个通用的参数化特征混合模块，使得令牌混合模式可在模型训练过程中被优化和学习。同时，这种通用的参数化令牌混合消除了TokenMixer中要求头数等于令牌数的约束。此外，我们还建立了推荐系统统一的缩放模块设计框架，搭建了基于注意力机制、基于TokenMixer和基于因子分解机方法之间的关联。为进一步提高缩放投资回报率，我们设计了一个轻量级UniMixing模块——\textbf{UniMixing-Lite}，该模块在显著提升模型性能的同时，进一步压缩了模型参数和计算成本。缩放曲线如下图所示。通过大量离线与在线实验验证了\textbf{UniMixer}的卓越缩放能力。