Recent developments in MIR have led to several benchmark deep learning models whose embeddings can be used for a variety of downstream tasks. At the same time, the vast majority of these models have been trained on Western pop/rock music and related styles. This leads to research questions on whether these models can be used to learn representations for different music cultures and styles, or whether we can build similar music audio embedding models trained on data from different cultures or styles. To that end, we leverage transfer learning methods to derive insights about the similarities between the different music cultures to which the data belongs to. We use two Western music datasets, two traditional/folk datasets coming from eastern Mediterranean cultures, and two datasets belonging to Indian art music. Three deep audio embedding models are trained and transferred across domains, including two CNN-based and a Transformer-based architecture, to perform auto-tagging for each target domain dataset. Experimental results show that competitive performance is achieved in all domains via transfer learning, while the best source dataset varies for each music culture. The implementation and the trained models are both provided in a public repository.
翻译:近期音乐信息检索(MIR)领域的发展催生了多个基准深度学习模型,其嵌入可用于多种下游任务。然而,绝大多数此类模型均基于西方流行/摇滚音乐及相关风格进行训练。这引发了研究问题:这些模型能否用于学习不同音乐文化和风格的表示?或者我们能否构建基于不同文化或风格数据训练的类似音乐音频嵌入模型?为此,我们利用迁移学习方法,从数据所属的不同音乐文化之间的相似性中获取洞察。我们使用了两个西方音乐数据集、两个来自东地中海地区的传统/民间数据集,以及两个属于印度艺术音乐的数据集。我们训练了三种深度音频嵌入模型(包括两种基于CNN和一种基于Transformer的架构),并在各领域间进行迁移,以对每个目标领域数据集执行自动标注。实验结果表明,通过迁移学习,所有领域均实现了具有竞争力的性能,而最佳源数据集因音乐文化而异。相关实现及预训练模型均已在公共代码库中提供。