Music Structure Analysis (MSA) is the task aiming at identifying musical segments that compose a music track and possibly label them based on their similarity. In this paper we propose a supervised approach for the task of music boundary detection. In our approach we simultaneously learn features and convolution kernels. For this we jointly optimize -- a loss based on the Self-Similarity-Matrix (SSM) obtained with the learned features, denoted by SSM-loss, and -- a loss based on the novelty score obtained applying the learned kernels to the estimated SSM, denoted by novelty-loss. We also demonstrate that relative feature learning, through self-attention, is beneficial for the task of MSA. Finally, we compare the performances of our approach to previously proposed approaches on the standard RWC-Pop, and various subsets of SALAMI.
翻译:音乐结构分析(MSA)旨在识别构成音乐曲目的音乐片段,并可能根据其相似性对其进行标注。本文提出了一种用于音乐边界检测任务的监督方法。在该方法中,我们同时学习特征和卷积核。为此,我们联合优化了基于学习特征所得自相似性矩阵(SSM)的损失(记为SSM损失)与基于将所学卷积核应用于估计SSM所得新颖性分数的损失(记为新奇损失)。我们还证明了通过自注意力机制进行的相对特征学习对MSA任务有益。最后,我们将本方法的性能与先前提出的方法在标准RWC-Pop数据集及SALAMI的多个子集上进行了比较。