Large language models (LLMs) have recently been used as backbones for recommender systems. However, their performance often lags behind conventional methods in standard tasks like retrieval. We attribute this to a mismatch between LLMs' knowledge and the knowledge crucial for effective recommendations. While LLMs excel at natural language reasoning, they cannot model complex user-item interactions inherent in recommendation tasks. We propose bridging the knowledge gap and equipping LLMs with recommendation-specific knowledge to address this. Operations such as Masked Item Modeling (MIM) and Bayesian Personalized Ranking (BPR) have found success in conventional recommender systems. Inspired by this, we simulate these operations through natural language to generate auxiliary-task data samples that encode item correlations and user preferences. Fine-tuning LLMs on such auxiliary-task data samples and incorporating more informative recommendation-task data samples facilitates the injection of recommendation-specific knowledge into LLMs. Extensive experiments across retrieval, ranking, and rating prediction tasks on LLMs such as FLAN-T5-Base and FLAN-T5-XL show the effectiveness of our technique in domains such as Amazon Toys & Games, Beauty, and Sports & Outdoors. Notably, our method outperforms conventional and LLM-based baselines, including the current SOTA, by significant margins in retrieval, showcasing its potential for enhancing recommendation quality.
翻译:大语言模型(LLMs)近期被用作推荐系统的骨干模型。然而,其在检索等标准任务中的性能通常落后于传统方法。我们将此归因于LLMs的知识与有效推荐所需知识之间的不匹配。尽管LLMs擅长自然语言推理,但无法建模推荐任务中固有的复杂用户-物品交互。为应对这一问题,我们提出弥合知识鸿沟,使LLMs具备推荐特定知识。掩码物品建模(MIM)和贝叶斯个性化排序(BPR)等操作已在传统推荐系统中取得成功。受此启发,我们通过自然语言模拟这些操作,生成编码物品关联和用户偏好的辅助任务数据样本。在此类辅助任务数据样本上微调LLMs,并纳入更具信息性的推荐任务数据样本,有助于将推荐特定知识注入LLMs。在FLAN-T5-Base和FLAN-T5-XL等LLMs上,针对检索、排序和评分预测任务的大量实验表明,我们的技术在亚马逊玩具与游戏、美妆、运动与户外等领域的有效性。值得注意的是,我们的方法在检索任务中以显著优势超越了包括当前最先进方法在内的传统和基于LLMs的基线方法,展示了提升推荐质量的潜力。