ChatGPT has emerged as a versatile tool, demonstrating capabilities across diverse domains. Given these successes, the Recommender Systems (RSs) community has begun investigating its applications within recommendation scenarios primarily focusing on accuracy. While the integration of ChatGPT into RSs has garnered significant attention, a comprehensive analysis of its performance across various dimensions remains largely unexplored. Specifically, the capabilities of providing diverse and novel recommendations or exploring potential biases such as popularity bias have not been thoroughly examined. As the use of these models continues to expand, understanding these aspects is crucial for enhancing user satisfaction and achieving long-term personalization. This study investigates the recommendations provided by ChatGPT-3.5 and ChatGPT-4 by assessing ChatGPT's capabilities in terms of diversity, novelty, and popularity bias. We evaluate these models on three distinct datasets and assess their performance in Top-N recommendation and cold-start scenarios. The findings reveal that ChatGPT-4 matches or surpasses traditional recommenders, demonstrating the ability to balance novelty and diversity in recommendations. Furthermore, in the cold-start scenario, ChatGPT models exhibit superior performance in both accuracy and novelty, suggesting they can be particularly beneficial for new users. This research highlights the strengths and limitations of ChatGPT's recommendations, offering new perspectives on the capacity of these models to provide recommendations beyond accuracy-focused metrics.
翻译:ChatGPT已成为一种多功能工具,在多个领域展现出卓越能力。鉴于这些成功,推荐系统(RSs)研究社区已开始探索其在推荐场景中的应用,主要聚焦于准确性指标。尽管ChatGPT与推荐系统的融合已引起广泛关注,但其在多维度性能上的综合分析仍属空白。具体而言,该模型在提供多样化与新颖推荐方面的能力,以及探索潜在偏差(如流行度偏差)的可能性尚未得到系统检验。随着此类模型应用的持续扩展,理解这些特性对于提升用户满意度与实现长期个性化至关重要。本研究通过评估ChatGPT-3.5与ChatGPT-4在多样性、新颖性及流行度偏差方面的表现,系统考察其推荐特性。我们在三个独立数据集上对这些模型进行测试,并评估其在Top-N推荐与冷启动场景中的性能。研究结果表明,ChatGPT-4达到或超越了传统推荐系统的水平,展现出平衡推荐新颖性与多样性的能力。此外,在冷启动场景中,ChatGPT模型在准确性与新颖性方面均表现优异,表明其对新用户具有特殊价值。本研究揭示了ChatGPT推荐的优势与局限,为理解此类模型超越准确性指标的推荐能力提供了新视角。