The debut of ChatGPT has recently attracted the attention of the natural language processing (NLP) community and beyond. Existing studies have demonstrated that ChatGPT shows significant improvement in a range of downstream NLP tasks, but the capabilities and limitations of ChatGPT in terms of recommendations remain unclear. In this study, we aim to conduct an empirical analysis of ChatGPT's recommendation ability from an Information Retrieval (IR) perspective, including point-wise, pair-wise, and list-wise ranking. To achieve this goal, we re-formulate the above three recommendation policies into a domain-specific prompt format. Through extensive experiments on four datasets from different domains, we demonstrate that ChatGPT outperforms other large language models across all three ranking policies. Based on the analysis of unit cost improvements, we identify that ChatGPT with list-wise ranking achieves the best trade-off between cost and performance compared to point-wise and pair-wise ranking. Moreover, ChatGPT shows the potential for mitigating the cold start problem and explainable recommendation. To facilitate further explorations in this area, the full code and detailed original results are open-sourced at https://github.com/rainym00d/LLM4RS.
翻译:ChatGPT的亮相近期引起了自然语言处理(NLP)领域及之外的广泛关注。现有研究表明,ChatGPT在一系列下游NLP任务中表现出显著提升,但其在推荐方面的能力与局限性仍不清楚。本研究旨在从信息检索(IR)视角对ChatGPT的推荐能力进行实证分析,涵盖逐点排序、成对排序和列表排序。为实现这一目标,我们将上述三种推荐策略重新表述为领域特定的提示格式。通过在四个不同领域数据集上的广泛实验,我们证明ChatGPT在所有三种排序策略上均优于其他大型语言模型。基于单位成本改进的分析,我们确定了与逐点排序和成对排序相比,采用列表排序的ChatGPT在成本与性能之间实现了最佳权衡。此外,ChatGPT展现出缓解冷启动问题及进行可解释推荐的潜力。为促进该领域的进一步探索,完整代码和详细原始结果已在 https://github.com/rainym00d/LLM4RS 开源。