Recently, sequential recommendation has been adapted to the LLM paradigm to enjoy the power of LLMs. LLM-based methods usually formulate recommendation information into natural language and the model is trained to predict the next item in an auto-regressive manner. Despite their notable success, the substantial computational overhead of inference poses a significant obstacle to their real-world applicability. In this work, we endeavor to streamline existing LLM-based recommendation models and propose a simple yet highly effective model Lite-LLM4Rec. The primary goal of Lite-LLM4Rec is to achieve efficient inference for the sequential recommendation task. Lite-LLM4Rec circumvents the beam search decoding by using a straight item projection head for ranking scores generation. This design stems from our empirical observation that beam search decoding is ultimately unnecessary for sequential recommendations. Additionally, Lite-LLM4Rec introduces a hierarchical LLM structure tailored to efficiently handle the extensive contextual information associated with items, thereby reducing computational overhead while enjoying the capabilities of LLMs. Experiments on three publicly available datasets corroborate the effectiveness of Lite-LLM4Rec in both performance and inference efficiency (notably 46.8% performance improvement and 97.28% efficiency improvement on ML-1m) over existing LLM-based methods. Our implementations will be open sourced.
翻译:近期,序列推荐领域已借助大语言模型范式来发挥其强大能力。基于LLM的方法通常将推荐信息转化为自然语言形式,并通过自回归方式预测下一物品。尽管这些方法取得了显著成功,但推理过程中巨大的计算开销严重阻碍了其实际应用。本研究致力于精简现有基于LLM的推荐模型,并提出一种简单而高效的模型Lite-LLM4Rec。Lite-LLM4Rec的核心目标是实现序列推荐任务的高效推理。该模型通过使用直接物品投影头生成排序分数,从而避免了束搜索解码过程——这一设计源于我们的实验观察:束搜索解码对于序列推荐任务本质上并非必要。此外,Lite-LLM4Rec引入了分层LLM架构,专门用于高效处理与物品相关的广泛上下文信息,在保持LLM能力的同时显著降低计算开销。在三组公开数据集上的实验证实,相较于现有基于LLM的方法,Lite-LLM4Rec在性能(在ML-1m数据集上提升46.8%)和推理效率(提升97.28%)方面均表现出显著优势。我们的实现将进行开源。