Adapting Large Language Models by Integrating Collaborative Semantics for Recommendation

Recently, large language models (LLMs) have shown great potential in recommender systems, either improving existing recommendation models or serving as the backbone. However, there exists a large semantic gap between LLMs and recommender systems, since items to be recommended are often indexed by discrete identifiers (item ID) out of the LLM's vocabulary. In essence, LLMs capture language semantics while recommender systems imply collaborative semantics, making it difficult to sufficiently leverage the model capacity of LLMs for recommendation. To address this challenge, in this paper, we propose a new LLM-based recommendation model called LC-Rec, which can better integrate language and collaborative semantics for recommender systems. Our approach can directly generate items from the entire item set for recommendation, without relying on candidate items. Specifically, we make two major contributions in our approach. For item indexing, we design a learning-based vector quantization method with uniform semantic mapping, which can assign meaningful and non-conflicting IDs (called item indices) for items. For alignment tuning, we propose a series of specially designed tuning tasks to enhance the integration of collaborative semantics in LLMs. Our fine-tuning tasks enforce LLMs to deeply integrate language and collaborative semantics (characterized by the learned item indices), so as to achieve an effective adaptation to recommender systems. Extensive experiments demonstrate the effectiveness of our method, showing that our approach can outperform a number of competitive baselines including traditional recommenders and existing LLM-based recommenders. Our code is available at https://github.com/RUCAIBox/LC-Rec/.

翻译：近年来，大型语言模型（LLMs）在推荐系统中展现出巨大潜力，既可改进现有推荐模型，也可作为核心架构。然而，由于待推荐物品通常采用LLM词表之外的离散标识符（物品ID）进行索引，LLM与推荐系统之间存在显著的语义鸿沟。本质上，LLM捕捉语言语义，而推荐系统蕴含协同语义，这导致难以充分释放LLM在推荐任务中的模型能力。针对这一挑战，本文提出名为LC-Rec的新型LLM推荐模型，能够更好地融合语言语义与协同语义。该方法无需依赖候选物品，可直接从完整物品集合中生成推荐结果。具体而言，我们的贡献体现在两个核心方面：在物品索引层面，我们设计了基于学习的向量量化方法，通过统一语义映射为物品分配具有语义且无冲突的ID（称为物品索引）；在对齐调优层面，我们提出一系列定制化调优任务，以增强LLM中协同语义的整合。这些微调任务强制LLM深度整合语言语义与基于物品索引的协同语义，从而实现对推荐系统的有效适配。大量实验证明本方法有效性，其性能优于包括传统推荐模型和现有LLM推荐模型在内的多个强基线。代码已开源：https://github.com/RUCAIBox/LC-Rec/。