Conversational recommender systems (CRS) aim to provide the recommendation service via natural language conversations. To develop an effective CRS, high-quality CRS datasets are very crucial. However, existing CRS datasets suffer from the long-tail issue, \ie a large proportion of items are rarely (or even never) mentioned in the conversations, which are called long-tail items. As a result, the CRSs trained on these datasets tend to recommend frequent items, and the diversity of the recommended items would be largely reduced, making users easier to get bored. To address this issue, this paper presents \textbf{LOT-CRS}, a novel framework that focuses on simulating and utilizing a balanced CRS dataset (\ie covering all the items evenly) for improving \textbf{LO}ng-\textbf{T}ail recommendation performance of CRSs. In our approach, we design two pre-training tasks to enhance the understanding of simulated conversation for long-tail items, and adopt retrieval-augmented fine-tuning with label smoothness strategy to further improve the recommendation of long-tail items. Extensive experiments on two public CRS datasets have demonstrated the effectiveness and extensibility of our approach, especially on long-tail recommendation.
翻译:对话推荐系统(CRS)旨在通过自然语言对话提供推荐服务。为开发有效的CRS,高质量的数据集至关重要。然而,现有CRS数据集存在长尾问题,即大量物品在对话中很少(甚至从未)被提及,这些物品称为长尾物品。这导致在这些数据集上训练的CRS倾向于推荐频繁出现的物品,推荐物品的多样性大打折扣,用户更容易感到厌倦。为解决此问题,本文提出\textbf{LOT-CRS},一个专注于模拟并利用平衡CRS数据集(即均匀覆盖所有物品)来提升CRS\textbf{长尾}推荐性能的新框架。我们设计了两个预训练任务以增强对长尾物品模拟对话的理解,并采用带有标签平滑策略的检索增强微调进一步改进长尾物品推荐。在两个公开CRS数据集上的大量实验证明了我们方法的有效性和可扩展性,尤其在长尾推荐方面。