Music streaming services often aim to recommend songs for users to extend the playlists they have created on these services. However, extending playlists while preserving their musical characteristics and matching user preferences remains a challenging task, commonly referred to as Automatic Playlist Continuation (APC). Besides, while these services often need to select the best songs to recommend in real-time and among large catalogs with millions of candidates, recent research on APC mainly focused on models with few scalability guarantees and evaluated on relatively small datasets. In this paper, we introduce a general framework to build scalable yet effective APC models for large-scale applications. Based on a represent-then-aggregate strategy, it ensures scalability by design while remaining flexible enough to incorporate a wide range of representation learning and sequence modeling techniques, e.g., based on Transformers. We demonstrate the relevance of this framework through in-depth experimental validation on Spotify's Million Playlist Dataset (MPD), the largest public dataset for APC. We also describe how, in 2022, we successfully leveraged this framework to improve APC in production on Deezer. We report results from a large-scale online A/B test on this service, emphasizing the practical impact of our approach in such a real-world application.
翻译:音乐流媒体服务通常致力于为用户推荐歌曲,以扩展其在这些平台上创建的播放列表。然而,如何在保持播放列表音乐特征的同时匹配用户偏好来扩展播放列表,仍是一项具有挑战性的任务,通常被称为自动播放列表续播(APC)。此外,尽管这些服务通常需要从包含数百万候选曲目的大型曲库中实时选择最佳推荐歌曲,但近期关于APC的研究主要集中在可扩展性保证较少的模型上,并在相对较小的数据集上进行评估。本文提出了一种通用框架,用于构建面向大规模应用的可扩展且高效的APC模型。该框架基于“先表示后聚合”策略,既从设计上保证了可扩展性,又保持了足够的灵活性,能够整合多种表示学习和序列建模技术(例如基于Transformer的技术)。我们通过在Spotify百万播放列表数据集(MPD)(APC领域最大的公开数据集)上进行深入的实验验证,证明了该框架的有效性。我们还描述了在2022年如何成功利用该框架改进Deezer平台生产环境中的APC。我们报告了在该服务上进行的大规模在线A/B测试结果,强调了我们的方法在此类实际应用中的实用价值。