The debut of ChatGPT has recently attracted the attention of the natural language processing (NLP) community and beyond. Existing studies have demonstrated that ChatGPT shows significant improvement in a range of downstream NLP tasks, but the capabilities and limitations of ChatGPT in terms of recommendations remain unclear. In this study, we aim to conduct an empirical analysis of ChatGPT's recommendation ability from an Information Retrieval (IR) perspective, including point-wise, pair-wise, and list-wise ranking. To achieve this goal, we re-formulate the above three recommendation policies into a domain-specific prompt format. Through extensive experiments on four datasets from different domains, we demonstrate that ChatGPT outperforms other large language models across all three ranking policies. Based on the analysis of unit cost improvements, we identify that ChatGPT with list-wise ranking achieves the best trade-off between cost and performance compared to point-wise and pair-wise ranking. Moreover, ChatGPT shows the potential for mitigating the cold start problem and explainable recommendation. To facilitate further explorations in this area, the full code and detailed original results are open-sourced at https://github.com/rainym00d/LLM4RS.
翻译:ChatGPT的首次亮相最近吸引了自然语言处理(NLP)社区及其他领域的关注。现有研究表明,ChatGPT在一系列下游NLP任务中显示出显著改进,但其在推荐方面的能力与局限性仍不明确。在本研究中,我们旨在从信息检索(IR)视角对ChatGPT的推荐能力进行实证分析,包括逐点排序、逐对排序和列表排序。为实现这一目标,我们将上述三种推荐策略重新表述为领域特定的提示格式。通过在来自不同领域的四个数据集上进行广泛实验,我们证明ChatGPT在所有三种排序策略上均优于其他大型语言模型。基于单位成本改进的分析,我们发现与逐点排序和逐对排序相比,采用列表排序的ChatGPT在成本与性能之间实现了最佳权衡。此外,ChatGPT展现出缓解冷启动问题和实现可解释推荐的潜力。为促进该领域的进一步探索,完整代码及详细的原始结果已在https://github.com/rainym00d/LLM4RS 开源。