In this paper, we present empirical studies on conversational recommendation tasks using representative large language models in a zero-shot setting with three primary contributions. (1) Data: To gain insights into model behavior in "in-the-wild" conversational recommendation scenarios, we construct a new dataset of recommendation-related conversations by scraping a popular discussion website. This is the largest public real-world conversational recommendation dataset to date. (2) Evaluation: On the new dataset and two existing conversational recommendation datasets, we observe that even without fine-tuning, large language models can outperform existing fine-tuned conversational recommendation models. (3) Analysis: We propose various probing tasks to investigate the mechanisms behind the remarkable performance of large language models in conversational recommendation. We analyze both the large language models' behaviors and the characteristics of the datasets, providing a holistic understanding of the models' effectiveness, limitations and suggesting directions for the design of future conversational recommenders
翻译:本文通过代表性大型语言模型在零样本设置下的对话推荐任务进行了实证研究,主要贡献包括三个方面:(1)数据:为深入理解模型在“真实场景”对话推荐中的行为特征,我们通过爬取热门讨论网站构建了推荐相关对话的新数据集。该数据集是目前规模最大的公开真实世界对话推荐数据集。(2)评估:在新数据集及两个现有对话推荐数据集上,我们观察到即便不进行微调,大型语言模型仍能超越现有经微调的对话推荐模型。(3)分析:我们提出多种探针任务,用以探究大型语言模型在对话推荐中表现卓越的内在机制。通过系统分析模型行为特征与数据集特性,我们全面揭示了模型的效能与局限性,并为未来对话推荐系统的设计指明方向。