In this work, we investigate an approach that relies on contrastive learning and music metadata as a weak source of supervision to train music representation models. Recent studies show that contrastive learning can be used with editorial metadata (e.g., artist or album name) to learn audio representations that are useful for different classification tasks. In this paper, we extend this idea to using playlist data as a source of music similarity information and investigate three approaches to generate anchor and positive track pairs. We evaluate these approaches by fine-tuning the pre-trained models for music multi-label classification tasks (genre, mood, and instrument tagging) and music similarity. We find that creating anchor and positive track pairs by relying on co-occurrences in playlists provides better music similarity and competitive classification results compared to choosing tracks from the same artist as in previous works. Additionally, our best pre-training approach based on playlists provides superior classification performance for most datasets.
翻译:本文研究了一种依赖对比学习及音乐元数据作为弱监督信号来训练音乐表征模型的方法。近年研究表明,利用编辑元数据(如艺术家或专辑名称)与对比学习相结合,可学习到对多种分类任务有用的音频表征。本文将该思路拓展至利用歌单数据作为音乐相似性信息来源,并探讨了三种生成锚点与正例曲目对的方法。通过将预训练模型微调至音乐多标签分类任务(流派、情绪与乐器标注)及音乐相似性任务,我们评估了这些方法的效果。研究发现,相较于前人工作中选取同一艺术家曲目的方法,基于歌单共现性生成的锚点与正例曲目对能提供更优的音乐相似性及具有竞争力的分类结果。此外,基于歌单的最佳预训练方法在大多数数据集上展现了更优的分类性能。