A conversational music retrieval system can help users discover music that matches their preferences through dialogue. To achieve this, a conversational music retrieval system should seamlessly engage in multi-turn conversation by 1) understanding user queries and 2) responding with natural language and retrieved music. A straightforward solution would be a data-driven approach utilizing such conversation logs. However, few datasets are available for the research and are limited in terms of volume and quality. In this paper, we present a data generation framework for rich music discovery dialogue using a large language model (LLM) and user intents, system actions, and musical attributes. This is done by i) dialogue intent analysis using grounded theory, ii) generating attribute sequences via cascading database filtering, and iii) generating utterances using large language models. By applying this framework to the Million Song dataset, we create LP-MusicDialog, a Large Language Model based Pseudo Music Dialogue dataset, containing over 288k music conversations using more than 319k music items. Our evaluation shows that the synthetic dataset is competitive with an existing, small human dialogue dataset in terms of dialogue consistency, item relevance, and naturalness. Furthermore, using the dataset, we train a conversational music retrieval model and show promising results.
翻译:对话式音乐检索系统能够通过对话帮助用户发现符合其偏好的音乐。为实现这一目标,对话式音乐检索系统应能无缝进行多轮对话,具体包括:1)理解用户查询;2)以自然语言和检索到的音乐进行回应。一种直接的解决方案是利用此类对话日志的数据驱动方法。然而,可用于研究的现有数据集数量稀少,且在规模与质量方面均存在局限。本文提出一种利用大语言模型(LLM)结合用户意图、系统动作及音乐属性的丰富音乐发现对话数据生成框架。该框架通过以下步骤实现:i)基于扎根理论进行对话意图分析;ii)通过级联数据库过滤生成属性序列;iii)使用大语言模型生成对话语句。通过将该框架应用于百万歌曲数据集(Million Song dataset),我们构建了LP-MusicDialog——一个基于大语言模型的伪音乐对话数据集,其中包含超过28.8万条音乐对话,涉及逾31.9万首音乐曲目。评估结果表明,该合成数据集在对话连贯性、项目相关性与自然度方面,可与现有小规模人工对话数据集相媲美。此外,利用该数据集训练的对话式音乐检索模型展现出具有前景的性能表现。