Recommendation systems have witnessed significant advancements and have been widely used over the past decades. However, most traditional recommendation methods are task-specific and therefore lack efficient generalization ability. Recently, the emergence of ChatGPT has significantly advanced NLP tasks by enhancing the capabilities of conversational models. Nonetheless, the application of ChatGPT in the recommendation domain has not been thoroughly investigated. In this paper, we employ ChatGPT as a general-purpose recommendation model to explore its potential for transferring extensive linguistic and world knowledge acquired from large-scale corpora to recommendation scenarios. Specifically, we design a set of prompts and evaluate ChatGPT's performance on five recommendation scenarios. Unlike traditional recommendation methods, we do not fine-tune ChatGPT during the entire evaluation process, relying only on the prompts themselves to convert recommendation tasks into natural language tasks. Further, we explore the use of few-shot prompting to inject interaction information that contains user potential interest to help ChatGPT better understand user needs and interests. Comprehensive experimental results on Amazon Beauty dataset show that ChatGPT has achieved promising results in certain tasks and is capable of reaching the baseline level in others. We conduct human evaluations on two explainability-oriented tasks to more accurately evaluate the quality of contents generated by different models. And the human evaluations show ChatGPT can truly understand the provided information and generate clearer and more reasonable results. We hope that our study can inspire researchers to further explore the potential of language models like ChatGPT to improve recommendation performance and contribute to the advancement of the recommendation systems field.
翻译:推荐系统在过去数十年间取得了显著进展并被广泛采用。然而,大多数传统推荐方法具有任务特异性,因此缺乏高效的泛化能力。近期,ChatGPT的出现通过增强对话模型的能力显著推动了自然语言处理任务的发展。尽管如此,ChatGPT在推荐领域的应用尚未得到充分研究。本文采用ChatGPT作为通用推荐模型,探索其将从大规模语料库中获取的广泛语言知识及世界知识迁移至推荐场景的潜力。具体而言,我们设计了一组提示模板,并在五个推荐场景下评估ChatGPT的性能。与传统推荐方法不同,我们在整个评估过程中未对ChatGPT进行微调,仅依靠提示本身将推荐任务转化为自然语言任务。此外,我们探索利用少样本提示注入包含用户潜在兴趣的交互信息,以帮助ChatGPT更深入地理解用户需求与兴趣。在Amazon Beauty数据集上的全面实验结果表明,ChatGPT在特定任务上取得了有前景的结果,并在其他任务上达到了基线水平。为更准确评估不同模型生成内容的质量,我们针对两个可解释性导向任务开展了人工评估。人工评估显示,ChatGPT能真正理解所提供信息,并生成更清晰、更合理的结果。我们希望本研究能激励研究者进一步探索类似ChatGPT的语言模型在提升推荐性能方面的潜力,并为推荐系统领域的发展做出贡献。