Adapting Large Language Models by Integrating Collaborative Semantics for Recommendation

Recently, large language models (LLMs) have shown great potential in recommender systems, either improving existing recommendation models or serving as the backbone. However, there exists a large semantic gap between LLMs and recommender systems, since items to be recommended are often indexed by discrete identifiers (item ID) out of the LLM's vocabulary. In essence, LLMs capture language semantics while recommender systems imply collaborative semantics, making it difficult to sufficiently leverage the model capacity of LLMs for recommendation. To address this challenge, in this paper, we propose a new LLM-based recommendation model called LC-Rec, which can better integrate language and collaborative semantics for recommender systems. Our approach can directly generate items from the entire item set for recommendation, without relying on candidate items. Specifically, we make two major contributions in our approach. For item indexing, we design a learning-based vector quantization method with uniform semantic mapping, which can assign meaningful and non-conflicting IDs (called item indices) for items. For alignment tuning, we propose a series of specially designed tuning tasks to enhance the integration of collaborative semantics in LLMs. Our fine-tuning tasks enforce LLMs to deeply integrate language and collaborative semantics (characterized by the learned item indices), so as to achieve an effective adaptation to recommender systems. Extensive experiments demonstrate the effectiveness of our method, showing that our approach can outperform a number of competitive baselines including traditional recommenders and existing LLM-based recommenders. Our code is available at https://github.com/RUCAIBox/LC-Rec/.

翻译：近期，大型语言模型在推荐系统中展现出巨大潜力，既可用于改进现有推荐模型，也可作为核心架构。然而，由于待推荐物品通常以离散标识符（物品ID）进行索引且不在大语言模型的词汇表中，两者之间存在显著的语义鸿沟。本质上，大语言模型捕捉的是语言语义，而推荐系统隐含的是协作语义，这使得难以充分运用大语言模型的模型能力进行推荐。为解决这一难题，本文提出了一种新型基于大语言模型的推荐模型——LC-Rec，该模型能够更有效地融合语言语义与协作语义。我们的方法可直接从完整物品集合中生成推荐结果，无需依赖候选物品。具体而言，本方法包含两大核心贡献：在物品索引方面，我们设计了基于学习的矢量量化方法，采用统一语义映射机制，可为物品赋予具有语义且无冲突的标识符（称为物品索引）；在对齐微调方面，我们提出了一系列专门设计的微调任务，以增强大语言模型对协作语义的整合能力。这些微调任务强制大语言模型深度融合语言语义与（通过所学物品索引表征的）协作语义，从而实现对推荐系统的有效适配。大量实验证明了我们方法的有效性，表明该方法可超越包括传统推荐模型和现有基于大语言模型的推荐模型在内的多个强基线方法。我们的代码已开源至 https://github.com/RUCAIBox/LC-Rec/。