From West to East: Who can understand the music of the others better?

Recent developments in MIR have led to several benchmark deep learning models whose embeddings can be used for a variety of downstream tasks. At the same time, the vast majority of these models have been trained on Western pop/rock music and related styles. This leads to research questions on whether these models can be used to learn representations for different music cultures and styles, or whether we can build similar music audio embedding models trained on data from different cultures or styles. To that end, we leverage transfer learning methods to derive insights about the similarities between the different music cultures to which the data belongs to. We use two Western music datasets, two traditional/folk datasets coming from eastern Mediterranean cultures, and two datasets belonging to Indian art music. Three deep audio embedding models are trained and transferred across domains, including two CNN-based and a Transformer-based architecture, to perform auto-tagging for each target domain dataset. Experimental results show that competitive performance is achieved in all domains via transfer learning, while the best source dataset varies for each music culture. The implementation and the trained models are both provided in a public repository.

翻译：近期音乐信息检索（MIR）领域的发展催生了多个基准深度学习模型，其嵌入可用于多种下游任务。然而，绝大多数此类模型均基于西方流行/摇滚音乐及相关风格进行训练。这引发了研究问题：这些模型能否用于学习不同音乐文化和风格的表示？或者我们能否构建基于不同文化或风格数据训练的类似音乐音频嵌入模型？为此，我们利用迁移学习方法，从数据所属的不同音乐文化之间的相似性中获取洞察。我们使用了两个西方音乐数据集、两个来自东地中海地区的传统/民间数据集，以及两个属于印度艺术音乐的数据集。我们训练了三种深度音频嵌入模型（包括两种基于CNN和一种基于Transformer的架构），并在各领域间进行迁移，以对每个目标领域数据集执行自动标注。实验结果表明，通过迁移学习，所有领域均实现了具有竞争力的性能，而最佳源数据集因音乐文化而异。相关实现及预训练模型均已在公共代码库中提供。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日