Music Foundation Model as Generic Booster for Music Downstream Tasks

WeiHsiang Liao,Yuhta Takida,Yukara Ikemiya,Zhi Zhong,Chieh-Hsin Lai,Giorgio Fabbro,Kazuki Shimada,Keisuke Toyama,Kinwai Cheuk,Marco A. Martínez-Ramírez,Shusuke Takahashi,Stefan Uhlich,Taketo Akama,Woosung Choi,Yuichiro Koyama,Yuki Mitsufuji

from arxiv, 41 pages with 14 figures

We demonstrate the efficacy of using intermediate representations from a single foundation model to enhance various music downstream tasks. We introduce SoniDo, a music foundation model (MFM) designed to extract hierarchical features from target music samples. By leveraging hierarchical intermediate features, SoniDo constrains the information granularity, leading to improved performance across various downstream tasks including both understanding and generative tasks. We specifically evaluated this approach on representative tasks such as music tagging, music transcription, music source separation, and music mixing. Our results reveal that the features extracted from foundation models provide valuable enhancements in training downstream task models. This highlights the capability of using features extracted from music foundation models as a booster for downstream tasks. Our approach not only benefits existing task-specific models but also supports music downstream tasks constrained by data scarcity. This paves the way for more effective and accessible music processing solutions.

翻译：我们证明了利用单一基础模型的中间表示来增强各种下游音乐任务的有效性。我们提出了SoniDo，这是一种旨在从目标音乐样本中提取层级特征的音乐基础模型。通过利用层级化的中间特征，SoniDo约束了信息粒度，从而在包括理解型和生成型任务在内的多种下游任务中提升了性能。我们特别在音乐标签分类、音乐转录、音乐源分离和音乐混音等代表性任务上评估了该方法。我们的结果表明，从基础模型提取的特征为下游任务模型的训练提供了有价值的增强。这突显了利用从音乐基础模型提取的特征作为下游任务增强器的能力。我们的方法不仅有益于现有的任务特定模型，也支持受数据稀缺约束的下游音乐任务。这为开发更高效、更易获取的音乐处理解决方案铺平了道路。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日