The sequential Recommendation (SR) task involves predicting the next item a user is likely to interact with, given their past interactions. The SR models examine the sequence of a user's actions to discern more complex behavioral patterns and temporal dynamics. Recent research demonstrates the great impact of LLMs on sequential recommendation systems, either viewing sequential recommendation as language modeling or serving as the backbone for user representation. Although these methods deliver outstanding performance, there is scant evidence of the necessity of a large language model and how large the language model is needed, especially in the sequential recommendation scene. Meanwhile, due to the huge size of LLMs, it is inefficient and impractical to apply a LLM-based model in real-world platforms that often need to process billions of traffic logs daily. In this paper, we explore the influence of LLMs' depth by conducting extensive experiments on large-scale industry datasets. Surprisingly, we discover that most intermediate layers of LLMs are redundant. Motivated by this insight, we empower small language models for SR, namely SLMRec, which adopt a simple yet effective knowledge distillation method. Moreover, SLMRec is orthogonal to other post-training efficiency techniques, such as quantization and pruning, so that they can be leveraged in combination. Comprehensive experimental results illustrate that the proposed SLMRec model attains the best performance using only 13% of the parameters found in LLM-based recommendation models, while simultaneously achieving up to 6.6x and 8.0x speedups in training and inference time costs, respectively.
翻译:序列推荐任务旨在根据用户的历史交互记录,预测其可能交互的下一个物品。序列推荐模型通过分析用户行为序列,以识别更复杂的行为模式与时序动态。近期研究表明,大型语言模型对序列推荐系统产生了重大影响,这些研究或将序列推荐视为语言建模任务,或利用LLM作为用户表征的主干网络。尽管这些方法取得了出色的性能,但鲜有证据表明大型语言模型的必要性及其所需规模,尤其在序列推荐场景中。同时,由于LLM规模庞大,在需要每日处理数十亿流量日志的实际平台中,基于LLM的模型效率低下且不切实际。本文通过在大规模工业数据集上进行大量实验,探究了LLM深度的影响。令人惊讶的是,我们发现LLM的大多数中间层是冗余的。基于此洞见,我们提出赋能小型语言模型用于序列推荐的方法——SLMRec,该方法采用了一种简单而有效的知识蒸馏策略。此外,SLMRec与量化、剪枝等后训练效率优化技术正交,因此可结合使用这些技术。综合实验结果表明,所提出的SLMRec模型仅需基于LLM的推荐模型13%的参数,即可达到最佳性能,同时在训练和推理时间成本上分别实现了最高6.6倍和8.0倍的加速。