Personalized content-based recommender systems have become indispensable tools for users to navigate through the vast amount of content available on platforms like daily news websites and book recommendation services. However, existing recommenders face significant challenges in understanding the content of items. Large language models (LLMs), which possess deep semantic comprehension and extensive knowledge from pretraining, have proven to be effective in various natural language processing tasks. In this study, we explore the potential of leveraging both open- and closed-source LLMs to enhance content-based recommendation. With open-source LLMs, we utilize their deep layers as content encoders, enriching the representation of content at the embedding level. For closed-source LLMs, we employ prompting techniques to enrich the training data at the token level. Through comprehensive experiments, we demonstrate the high effectiveness of both types of LLMs and show the synergistic relationship between them. Notably, we observed a significant relative improvement of up to 19.32% compared to existing state-of-the-art recommendation models. These findings highlight the immense potential of both open- and closed-source of LLMs in enhancing content-based recommendation systems. We will make our code and LLM-generated data available for other researchers to reproduce our results.
翻译:个性化基于内容的推荐系统已成为用户浏览海量内容(如每日新闻网站和书籍推荐服务)时不可或缺的工具。然而,现有推荐系统在理解项目内容方面面临重大挑战。大型语言模型(LLMs)凭借其深层的语义理解能力和预训练中获得的广泛知识,已被证明在多种自然语言处理任务中非常有效。在本研究中,我们探讨了利用开源和闭源LLMs来增强基于内容推荐的潜力。对于开源LLMs,我们利用其深层网络作为内容编码器,在嵌入层面丰富内容的表征。对于闭源LLMs,我们采用提示工程技术在令牌层面丰富训练数据。通过全面的实验,我们证明了两种类型LLMs的高效性,并展示了它们之间的协同关系。值得注意的是,与现有最先进的推荐模型相比,我们观察到了高达19.32%的显著相对改进。这些发现凸显了开源与闭源LLMs在增强基于内容的推荐系统方面的巨大潜力。我们将公开代码和LLM生成的数据,以供其他研究人员复现我们的结果。