Scaling Law of Large Sequential Recommendation Models

Scaling of neural networks has recently shown great potential to improve the model capacity in various fields. Specifically, model performance has a power-law relationship with model size or data size, which provides important guidance for the development of large-scale models. However, there is still limited understanding on the scaling effect of user behavior models in recommender systems, where the unique data characteristics (e.g. data scarcity and sparsity) pose new challenges to explore the scaling effect in recommendation tasks. In this work, we focus on investigating the scaling laws in large sequential recommendation models. Specially, we consider a pure ID-based task formulation, where the interaction history of a user is formatted as a chronological sequence of item IDs. We don't incorporate any side information (e.g. item text), because we would like to explore how scaling law holds from the perspective of user behavior. With specially improved strategies, we scale up the model size to 0.8B parameters, making it feasible to explore the scaling effect in a diverse range of model sizes. As the major findings, we empirically show that scaling law still holds for these trained models, even in data-constrained scenarios. We then fit the curve for scaling law, and successfully predict the test loss of the two largest tested model scales. Furthermore, we examine the performance advantage of scaling effect on five challenging recommendation tasks, considering the unique issues (e.g. cold start, robustness, long-term preference) in recommender systems. We find that scaling up the model size can greatly boost the performance on these challenging tasks, which again verifies the benefits of large recommendation models.

翻译：神经网络的缩放效应近期在多个领域展现出提升模型容量的巨大潜力。具体而言，模型性能与模型规模或数据规模呈幂律关系，这为大规模模型的发展提供了重要指导。然而，在推荐系统中，用户行为模型的缩放效应仍缺乏深入理解，其独特的数据特征（如数据稀缺性和稀疏性）为探索推荐任务中的缩放效应带来了新挑战。本研究聚焦于探究大型序列推荐模型中的缩放定律。特别地，我们采用纯ID驱动的任务形式，将用户交互历史建模为按时间顺序排列的商品ID序列。为从用户行为视角探究缩放定律的成立机制，我们未引入任何辅助信息（如商品文本）。通过专门设计的改进策略，我们将模型参数量扩展至8亿，使其能在多样化的模型规模范围内探索缩放效应。核心发现表明：即使在数据受限场景下，缩放定律对这些训练模型依然成立。我们进一步拟合缩放定律曲线，并成功预测了两种最大测试模型规模的测试损失。此外，针对推荐系统中的独特挑战（如冷启动、鲁棒性、长期偏好），我们在五项具有挑战性的推荐任务中检验了缩放效应的性能优势。研究表明，扩展模型规模可显著提升这些困难任务的性能，再次验证了大型推荐模型的优越性。