In the era of extensive intersection between art and Artificial Intelligence (AI), such as image generation and fiction co-creation, AI for music remains relatively nascent, particularly in music understanding. This is evident in the limited work on deep music representations, the scarcity of large-scale datasets, and the absence of a universal and community-driven benchmark. To address this issue, we introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE. It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description. We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines. Besides, MARBLE offers an easy-to-use, extendable, and reproducible suite for the community, with a clear statement on copyright issues on datasets. Results suggest recently proposed large-scale pre-trained musical language models perform the best in most tasks, with room for further improvement. The leaderboard and toolkit repository are published at https://marble-bm.shef.ac.uk to promote future music AI research.
翻译:在艺术与人工智能(AI)广泛交叉的时代(如图像生成和小说共创),音乐领域的人工智能仍相对处于初期阶段,尤其是在音乐理解方面。这体现在深度音乐表示的研究有限、大规模数据集的稀缺,以及缺乏通用且由社区驱动的基准。为解决这一问题,我们提出了面向通用评估的音乐音频表示基准(Music Audio Representation Benchmark for universaL Evaluation),简称MARBLE。该基准通过定义包含声学、演奏、乐谱和高层描述四个层级的综合分类体系,旨在为各类音乐信息检索(MIR)任务提供基准。我们随后基于8个公开数据集上的14项任务建立了统一协议,以公平且标准化的方式评估所有基于音乐录音开发的开源预训练模型的表示能力,并将其作为基线。此外,MARBLE为社区提供了一套易用、可扩展且可复现的工具套件,并就数据集的版权问题进行了明确声明。结果表明,近期提出的大规模预训练音乐语言模型在多数任务中表现最佳,但仍存在进一步改进空间。排行榜及工具包仓库发布于https://marble-bm.shef.ac.uk,旨在推动未来音乐AI研究。