Talk the Walk: Synthetic Data Generation for Conversational Music Recommendation

Recommendation systems are ubiquitous yet often difficult for users to control and adjust when recommendation quality is poor. This has motivated the development of conversational recommendation systems (CRSs), with control over recommendations provided through natural language feedback. However, building conversational recommendation systems requires conversational training data involving user utterances paired with items that cover a diverse range of preferences. Such data has proved challenging to collect scalably using conventional methods like crowdsourcing. We address it in the context of item-set recommendation, noting the increasing attention to this task motivated by use cases like music, news and recipe recommendation. We present a new technique, TalkTheWalk, that synthesizes realistic high-quality conversational data by leveraging domain expertise encoded in widely available curated item collections, showing how these can be transformed into corresponding item set curation conversations. Specifically, TalkTheWalk generates a sequence of hypothetical yet plausible item sets returned by a system, then uses a language model to produce corresponding user utterances. Applying TalkTheWalk to music recommendation, we generate over one million diverse playlist curation conversations. A human evaluation shows that the conversations contain consistent utterances with relevant item sets, nearly matching the quality of small human-collected conversational data for this task. At the same time, when the synthetic corpus is used to train a CRS, it improves Hits@100 by 10.5 points on a benchmark dataset over standard baselines and is preferred over the top-performing baseline in an online evaluation.

翻译：推荐系统无处不在，但当推荐质量不佳时，用户往往难以控制和调整。这一挑战推动了对话式推荐系统（CRS）的发展，该系统通过自然语言反馈提供对推荐结果的控制。然而，构建对话式推荐系统需要包含用户话语与对应物品的对话训练数据，这些数据需覆盖多样化的偏好。传统方法（如众包）难以规模化收集此类数据。针对物品集合推荐这一任务（因音乐、新闻和食谱推荐等应用场景日益受到关注），我们提出了一种名为TalkTheWalk的新技术，通过利用广泛可用的专家策展物品集合中编码的领域知识，合成高质量且逼真的对话数据，并展示如何将这些策展集合转化为对应的物品集合策展对话。具体而言，TalkTheWalk首先生成由系统返回的一系列假设但合理的物品集合，再使用语言模型生成对应的用户话语。将该技术应用于音乐推荐领域后，我们生成了超过一百万条多样化的歌单策展对话。人工评估表明，这些对话中包含与相关物品集合一致的连贯话语，其质量几乎媲美该任务中少量人工收集的对话数据。同时，当使用该合成语料库训练对话式推荐系统时，在基准数据集上Hits@100指标相较于标准基线提升10.5个点，在线评估中优于表现最好的基线系统。