High-quality data is essential for conversational recommendation systems and serves as the cornerstone of the network architecture development and training strategy design. Existing works contribute heavy human efforts to manually labeling or designing and extending recommender dialogue templates. However, they suffer from (i) the limited number of human annotators results in that datasets can hardly capture rich and large-scale cases in the real world, (ii) the limited experience and knowledge of annotators account for the uninformative corpus and inappropriate recommendations. In this paper, we propose a novel automatic dataset synthesis approach that can generate both large-scale and high-quality recommendation dialogues through a data2text generation process, where unstructured recommendation conversations are generated from structured graphs based on user-item information from the real world. In doing so, we comprehensively exploit: (i) rich personalized user profiles from traditional recommendation datasets, (ii) rich external knowledge from knowledge graphs, and (iii) the conversation ability contained in human-to-human conversational recommendation datasets. Extensive experiments validate the benefit brought by the automatically synthesized data under low-resource scenarios and demonstrate the promising potential to facilitate the development of a more effective conversational recommendation system.
翻译:高质量数据对于对话推荐系统至关重要,是网络架构设计与训练策略开发的基石。现有研究投入大量人力进行模板标注、设计及推荐对话模板的扩展工作,但仍存在以下问题:(i)人工标注者数量有限导致数据集难以覆盖现实世界中丰富的大规模案例;(ii)标注者经验与知识的局限性导致语料信息量不足且推荐不准确。本文提出一种新颖的自动化数据集合成方法,通过数据到文本(data2text)生成流程,基于真实世界中用户-物品信息的结构化图生成非结构化的推荐对话,从而同时生成大规模高质量的推荐对话。具体而言,我们充分利用:(i)传统推荐数据集中丰富的个性化用户画像;(ii)知识图谱中丰富的外部知识;(iii)人机对话推荐数据集中蕴含的对话能力。大量实验验证了低资源场景下自动合成数据的优势,并展示了其在促进更高效对话推荐系统发展方面的巨大潜力。