Sequential Recommendation (SR) task involves predicting the next item a user is likely to interact with, given their past interactions. The SR models examine the sequence of a user's actions to discern more complex behavioral patterns and temporal dynamics. Recent research demonstrates the great impact of LLMs on sequential recommendation systems, either viewing sequential recommendation as language modeling or serving as the backbone for user representation. Although these methods deliver outstanding performance, there is scant evidence of the necessity of a large language model and how large the language model is needed, especially in the sequential recommendation scene. Meanwhile, due to the huge size of LLMs, it is inefficient and impractical to apply a LLM-based model in real-world platforms that often need to process billions of traffic logs daily. In this paper, we explore the influence of LLMs' depth by conducting extensive experiments on large-scale industry datasets. Surprisingly, our motivational experiments reveal that most intermediate layers of LLMs are redundant, indicating that pruning the remaining layers can still maintain strong performance. Motivated by this insight, we empower small language models for SR, namely SLMRec, which adopt a simple yet effective knowledge distillation method. Moreover, SLMRec is orthogonal to other post-training efficiency techniques, such as quantization and pruning, so that they can be leveraged in combination. Comprehensive experimental results illustrate that the proposed SLMRec model attains the best performance using only 13% of the parameters found in LLM-based recommendation models while simultaneously achieving up to 6.6x and 8.0x speedups in training and inference time costs, respectively. Besides, we provide a theoretical justification for why small language models can perform comparably to large language models in SR.
翻译:序列推荐任务旨在根据用户的历史交互记录,预测其可能交互的下一个项目。序列推荐模型通过分析用户行为序列,以识别更复杂的行为模式与时间动态。近期研究表明,大型语言模型对序列推荐系统具有重大影响,其应用方式包括将序列推荐视为语言建模任务,或作为用户表征的主干网络。尽管这些方法展现出卓越性能,但鲜有证据表明大型语言模型的必要性及其所需规模,尤其在序列推荐场景中。同时,由于大型语言模型参数量巨大,在需要每日处理数十亿流量日志的实际工业平台中,基于大型语言模型的方案存在效率低下且不实用的问题。本文通过在大规模工业数据集上进行广泛实验,探究了语言模型深度的影响。令人惊讶的是,我们的动机实验表明大型语言模型的大部分中间层是冗余的,修剪这些剩余层仍能保持强劲性能。基于此发现,我们提出赋能小型语言模型进行序列推荐的SLMRec方案,该方法采用简洁高效的知识蒸馏策略。此外,SLMRec与量化、剪枝等后训练效率优化技术具有正交性,可实现组合应用。综合实验结果表明,所提出的SLMRec模型仅需基于大型语言模型的推荐模型13%的参数量,即可达到最佳性能,同时在训练和推理时间成本上分别实现最高6.6倍和8.0倍的加速。此外,我们从理论层面论证了小型语言模型在序列推荐任务中能够取得与大型语言模型相当性能的内在机理。