As online music consumption increasingly shifts towards playlist-based listening, the task of playlist continuation, in which an algorithm suggests songs to extend a playlist in a personalized and musically cohesive manner, has become vital to the success of music streaming. Currently, many existing playlist continuation approaches rely on collaborative filtering methods to perform recommendation. However, such methods will struggle to recommend songs that lack interaction data, an issue known as the cold-start problem. Current approaches to this challenge design complex mechanisms for extracting relational signals from sparse collaborative data and integrating them into content representations. However, these approaches leave content representation learning out of scope and utilize frozen, pre-trained content models that may not be aligned with the distribution or format of a specific musical setting. Furthermore, even the musical state-of-the-art content modules are either (1) incompatible with the cold-start setting or (2) unable to effectively integrate cross-modal and relational signals. In this paper, we introduce LARP, a multi-modal cold-start playlist continuation model, to effectively overcome these limitations. LARP is a three-stage contrastive learning framework that integrates both multi-modal and relational signals into its learned representations. Our framework uses increasing stages of task-specific abstraction: within-track (language-audio) contrastive loss, track-track contrastive loss, and track-playlist contrastive loss. Experimental results on two publicly available datasets demonstrate the efficacy of LARP over uni-modal and multi-modal models for playlist continuation in a cold-start setting. Code and dataset are released at: https://github.com/Rsalganik1123/LARP.
翻译:随着在线音乐消费日益转向基于播放列表的聆听模式,播放列表延续任务——即算法以个性化且音乐连贯的方式推荐歌曲以扩展播放列表——已成为音乐流媒体平台成功的关键。目前,许多现有的播放列表延续方法依赖于协同过滤技术进行推荐。然而,此类方法在推荐缺乏交互数据的歌曲时会面临困难,即所谓的冷启动问题。当前应对这一挑战的方法设计了复杂机制,用于从稀疏的协同数据中提取关系信号并将其整合到内容表征中。但这些方法将内容表征学习排除在框架之外,并采用预训练且固定的内容模型,这些模型可能与特定音乐场景的数据分布或格式不匹配。此外,即使是最先进的音乐内容模块也存在以下局限:(1) 无法适应冷启动场景,或 (2) 难以有效整合跨模态与关系信号。本文提出LARP,一种多模态冷启动播放列表延续模型,以有效克服这些限制。LARP采用三阶段对比学习框架,将多模态信号与关系信号共同整合到学习表征中。该框架通过逐级增强的任务特定抽象实现:轨道内(语言-音频)对比损失、轨道-轨道对比损失以及轨道-播放列表对比损失。在两个公开数据集上的实验结果表明,在冷启动场景下,LARP在播放列表延续任务上优于单模态与多模态模型。代码与数据集发布于:https://github.com/Rsalganik1123/LARP。