Although annotated music descriptor datasets for user queries are increasingly common, few consider the user's intent behind these descriptors, which is essential for effectively meeting their needs. We introduce MusicRecoIntent, a manually annotated corpus of 2,291 Reddit music requests, labeling musical descriptors across seven categories with positive, negative, or referential preference-bearing roles. We then investigate how reliably large language models (LLMs) can extract these music descriptors, finding that they do capture explicit descriptors but struggle with context-dependent ones. This work can further serve as a benchmark for fine-grained modeling of user intent and for gaining insights into improving LLM-based music understanding systems.
翻译:尽管针对用户查询的标注音乐描述符数据集日益普遍,但很少有研究考虑这些描述符背后的用户意图,而这对有效满足用户需求至关重要。我们引入了MusicRecoIntent,一个包含2,291条Reddit音乐请求的人工标注语料库,在七个类别中对音乐描述符进行了标注,并标识其具有积极、消极或指代性的偏好承载角色。随后,我们研究了大型语言模型(LLMs)提取这些音乐描述符的可靠性,发现它们能够捕捉显式描述符,但在处理依赖于上下文的描述符时存在困难。本工作可进一步作为用户意图细粒度建模的基准,并为改进基于LLM的音乐理解系统提供见解。