REGEN: A Dataset and Benchmarks with Natural Language Critiques and Narratives

Kun Su,Krishna Sayana,Hubert Pham,James Pine,Yuri Vasilevski,Raghavendra Vasudeva,Marialena Kyriakidi,Liam Hebert,Ambarish Jash,Anushya Subbiah,Sukhdeep Sodhi

This paper introduces a novel dataset REGEN (Reviews Enhanced with GEnerative Narratives), designed to benchmark the conversational capabilities of recommender Large Language Models (LLMs), addressing the limitations of existing datasets that primarily focus on sequential item prediction. REGEN extends the Amazon Product Reviews dataset by inpainting two key natural language features: (1) user critiques, representing user "steering" queries that lead to the selection of a subsequent item, and (2) narratives, rich textual outputs associated with each recommended item taking into account prior context. The narratives include product endorsements, purchase explanations, and summaries of user preferences. Further, we establish an end-to-end modeling benchmark for the task of conversational recommendation, where models are trained to generate both recommendations and corresponding narratives conditioned on user history (items and critiques). For this joint task, we introduce a modeling framework LUMEN (LLM-based Unified Multi-task Model with Critiques, Recommendations, and Narratives) which uses an LLM as a backbone for critiquing, retrieval and generation. We also evaluate the dataset's quality using standard auto-rating techniques and benchmark it by training both traditional and LLM-based recommender models. Our results demonstrate that incorporating critiques enhances recommendation quality by enabling the recommender to learn language understanding and integrate it with recommendation signals. Furthermore, LLMs trained on our dataset effectively generate both recommendations and contextual narratives, achieving performance comparable to state-of-the-art recommenders and language models.

翻译：本文介绍了一个新颖的数据集REGEN（基于生成性叙事增强的评论数据集），旨在评估推荐型大语言模型（LLM）的对话能力，以解决现有数据集主要关注序列化物品预测的局限性。REGEN通过补全两个关键自然语言特征扩展了亚马逊产品评论数据集：（1）用户评论，代表引导用户选择后续物品的“导向”查询；（2）叙事，即结合先前上下文为每个推荐物品生成的丰富文本输出。这些叙事包括产品推荐理由、购买解释以及用户偏好总结。此外，我们为对话式推荐任务建立了一个端到端建模基准，其中模型需根据用户历史（物品与评论）同时生成推荐内容及相应叙事。针对该联合任务，我们提出了LUMEN建模框架（基于LLM的评论-推荐-叙事统一多任务模型），该框架使用LLM作为评论处理、检索与生成的核心架构。我们通过标准自动评分技术评估了数据集质量，并基于传统推荐模型与LLM推荐模型进行基准测试。实验结果表明，引入评论能通过使推荐系统学习语言理解并将其与推荐信号相结合，从而提升推荐质量。此外，基于本数据集训练的LLM能够有效生成推荐内容与情境化叙事，其性能达到与前沿推荐系统及语言模型相当的水平。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日