Conversational recommender systems (CRSs) aim to recommend high-quality items to users through a dialogue interface. It usually contains multiple sub-tasks, such as user preference elicitation, recommendation, explanation, and item information search. To develop effective CRSs, there are some challenges: 1) how to properly manage sub-tasks; 2) how to effectively solve different sub-tasks; and 3) how to correctly generate responses that interact with users. Recently, Large Language Models (LLMs) have exhibited an unprecedented ability to reason and generate, presenting a new opportunity to develop more powerful CRSs. In this work, we propose a new LLM-based CRS, referred to as LLMCRS, to address the above challenges. For sub-task management, we leverage the reasoning ability of LLM to effectively manage sub-task. For sub-task solving, we collaborate LLM with expert models of different sub-tasks to achieve the enhanced performance. For response generation, we utilize the generation ability of LLM as a language interface to better interact with users. Specifically, LLMCRS divides the workflow into four stages: sub-task detection, model matching, sub-task execution, and response generation. LLMCRS also designs schema-based instruction, demonstration-based instruction, dynamic sub-task and model matching, and summary-based generation to instruct LLM to generate desired results in the workflow. Finally, to adapt LLM to conversational recommendations, we also propose to fine-tune LLM with reinforcement learning from CRSs performance feedback, referred to as RLPF. Experimental results on benchmark datasets show that LLMCRS with RLPF outperforms the existing methods.
翻译:对话推荐系统旨在通过对话界面向用户推荐高质量物品。它通常包含多个子任务,例如用户偏好收集、推荐、解释和物品信息搜索。为开发有效的对话推荐系统,存在一些挑战:1)如何恰当管理子任务;2)如何有效解决不同子任务;以及3)如何正确生成与用户交互的回复。近年来,大型语言模型展现出前所未有的推理和生成能力,为开发更强大的对话推荐系统提供了新机遇。本工作中,我们提出了一种基于LLM的新型对话推荐系统,称为LLMCRS,以应对上述挑战。在子任务管理方面,我们利用LLM的推理能力来有效管理子任务。在子任务解决方面,我们让LLM与不同子任务的专家模型协作以实现更优性能。在回复生成方面,我们利用LLM的生成能力作为语言接口以更好地与用户交互。具体而言,LLMCRS将工作流程划分为四个阶段:子任务检测、模型匹配、子任务执行和回复生成。LLMCRS还设计了基于模式的指令、基于示例的指令、动态子任务与模型匹配,以及基于摘要的生成,以指导LLM在工作流程中生成期望结果。最后,为使LLM适应对话推荐场景,我们提出使用对话推荐系统性能反馈的强化学习对LLM进行微调,称为RLPF。在基准数据集上的实验结果表明,采用RLPF的LLMCRS性能优于现有方法。