Adapting Large Language Models by Integrating Collaborative Semantics for Recommendation

Recently, large language models (LLMs) have shown great potential in recommender systems, either improving existing recommendation models or serving as the backbone. However, there exists a large semantic gap between LLMs and recommender systems, since items to be recommended are often indexed by discrete identifiers (item ID) out of the LLM's vocabulary. In essence, LLMs capture language semantics while recommender systems imply collaborative semantics, making it difficult to sufficiently leverage the model capacity of LLMs for recommendation. To address this challenge, in this paper, we propose a new LLM-based recommendation model called LC-Rec, which can better integrate language and collaborative semantics for recommender systems. Our approach can directly generate items from the entire item set for recommendation, without relying on candidate items. Specifically, we make two major contributions in our approach. For item indexing, we design a learning-based vector quantization method with uniform semantic mapping, which can assign meaningful and non-conflicting IDs (called item indices) for items. For alignment tuning, we propose a series of specially designed tuning tasks to enhance the integration of collaborative semantics in LLMs. Our fine-tuning tasks enforce LLMs to deeply integrate language and collaborative semantics (characterized by the learned item indices), so as to achieve an effective adaptation to recommender systems. Extensive experiments demonstrate the effectiveness of our method, showing that our approach can outperform a number of competitive baselines including traditional recommenders and existing LLM-based recommenders. Our code is available at https://github.com/RUCAIBox/LC-Rec/.

翻译：近年来，大语言模型（LLMs）在推荐系统中展现出巨大潜力，既能改进现有推荐模型，也可作为核心骨干。然而，LLMs与推荐系统之间存在显著语义鸿沟：待推荐项目常以离散标识符（项目ID）索引，而这些标识符并非LLM词汇表内概念。本质上，LLMs捕捉语言语义，而推荐系统隐含协同语义，这导致难以充分释放LLMs在推荐任务中的模型能力。为解决这一挑战，本文提出一种新型基于LLM的推荐模型LC-Rec，该模型能更有效地融合语言与协同语义。我们的方法可直接从完整候选项目集合中生成推荐结果，无需依赖预筛选项目。具体而言，本研究有两项核心贡献：在项目索引方面，我们设计了基于学习的向量量化方法，通过统一语义映射为项目赋予具备语义且无冲突的标识（称为项目索引）；在对齐微调方面，我们提出一系列特制微调任务，以增强LLMs中协同语义的整合。这些微调任务迫使LLMs深度融合语言语义与（由学习得到的项目索引表征的）协同语义，从而实现对推荐系统的有效适配。大量实验证明本方法的有效性，其性能超越包括传统推荐模型与现有基于LLM的推荐模型在内的多个强基线方法。代码已开源：https://github.com/RUCAIBox/LC-Rec/