Music Structure Analysis is an open research task in Music Information Retrieval (MIR). In the past, there have been several works that attempt to segment music into the audio and symbolic domains, however, the identification and segmentation of the music structure at different levels is still an open research problem in this area. In this work we propose three methods, two of which are novel graph-based algorithms that aim to segment symbolic music by its form or structure: Norm, G-PELT and G-Window. We performed an ablation study with two public datasets that have different forms or structures in order to compare such methods varying their parameter values and comparing the performance against different music styles. We have found that encoding symbolic music with graph representations and computing the novelty of Adjacency Matrices obtained from graphs represent the structure of symbolic music pieces well without the need to extract features from it. We are able to detect the boundaries with an online unsupervised changepoint detection method with a F_1 of 0.5640 for a 1 bar tolerance in one of the public datasets that we used for testing our methods. We also provide the performance results of the algorithms at different levels of structure, high, medium and low, to show how the parameters of the proposed methods have to be adjusted depending on the level. We added the best performing method with its parameters for each structure level to musicaiz, an open source python package, to facilitate the reproducibility and usability of this work. We hope that this methods could be used to improve other MIR tasks such as music generation with structure, music classification or key changes detection.
翻译:音乐结构分析是音乐信息检索(MIR)领域的一项开放性研究任务。过去已有若干研究尝试在音频和符号域中对音乐进行分段,然而,不同层次音乐结构的识别与分割仍是该领域的未解难题。本文提出三种方法,其中两种是基于图的新型算法,旨在通过形式或结构对符号音乐进行分段:Norm、G-PELT和G-Window。我们使用两个具有不同形式或结构的公开数据集进行消融研究,通过调整参数值并比较不同音乐风格下的性能来评估这些方法。研究发现,将符号音乐编码为图表示,并计算从图中获取的邻接矩阵的新颖性,能够很好地表征符号音乐作品的结构,而无需从中提取特征。通过一种在线无监督变点检测方法,在用于测试方法的公开数据集中,我们能够在1小节容差下实现0.5640的F_1值来检测边界。我们还提供了算法在不同结构层次(高、中、低)下的性能结果,以展示所提方法参数需根据层次进行调整的必要性。我们已将每个结构层次下表现最佳的方法及其参数添加至开源Python包musicaiz中,以促进本工作的可复现性和实用性。希望这些方法能够用于改进其他MIR任务,例如带结构的音乐生成、音乐分类或调性变化检测。