Talk the Walk: Synthetic Data Generation for Conversational Music Recommendation

Recommender systems are ubiquitous yet often difficult for users to control, and adjust if recommendation quality is poor. This has motivated conversational recommender systems (CRSs), with control provided through natural language feedback. However, as with most application domains, building robust CRSs requires training data that reflects system usage$\unicode{x2014}$here conversations with user utterances paired with items that cover a wide range of preferences. This has proved challenging to collect scalably using conventional methods. We address the question of whether it can be generated synthetically, building on recent advances in natural language. We evaluate in the setting of item set recommendation, noting the increasing attention to this task motivated by use cases like music, news, and recipe recommendation. We present TalkTheWalk, which synthesizes realistic high-quality conversational data by leveraging domain expertise encoded in widely available curated item collections, generating a sequence of hypothetical yet plausible item sets, then using a language model to produce corresponding user utterances. We generate over one million diverse playlist curation conversations in the music domain, and show these contain consistent utterances with relevant item sets nearly matching the quality of an existing but small human-collected dataset for this task. We demonstrate the utility of the generated synthetic dataset on a conversational item retrieval task and show that it improves over both unsupervised baselines and systems trained on a real dataset.

翻译：推荐系统无处不在，但用户往往难以控制，且在推荐质量不佳时难以调整。这一现状催生了对话式推荐系统（CRS），通过自然语言反馈提供控制能力。然而，与大多数应用领域类似，构建稳健的CRS需要反映系统实际使用场景的训练数据——即包含覆盖广泛偏好的用户话语与物品配对的对话数据。传统方法难以实现此类数据的大规模采集。我们基于自然语言处理的最新进展，探究能否通过合成方式生成此类数据。我们在物品集推荐场景下进行评估，注意到受音乐、新闻、菜谱推荐等用例驱动，该任务正获得日益增多的关注。我们提出TalkTheWalk方法，通过利用广泛可用的策展物品集合中蕴含的领域专家知识，先生成一系列假设但合理的物品集，再使用语言模型生成对应的用户话语，从而合成高质量的真实感对话数据。我们在音乐领域生成了超过一百万条多样化的播放列表策展对话，并证明这些对话包含与相关物品集一致的话语，其质量几乎媲美该任务现有但规模较小的人工采集数据集。我们进一步在对话式物品检索任务中验证了合成数据的实用性，证明其性能优于无监督基线方法及基于真实数据集训练的系统。