Music motif, as a conceptual building block of composition, is crucial for music structure analysis and automatic composition. While human listeners can identify motifs easily, existing computational models fall short in representing motifs and their developments. The reason is that the nature of motifs is implicit, and the diversity of motif variations extends beyond simple repetitions and modulations. In this study, we aim to learn the implicit relationship between motifs and their variations via representation learning, using the Siamese network architecture and a pretraining and fine-tuning pipeline. A regularization-based method, VICReg, is adopted for pretraining, while contrastive learning is used for fine-tuning. Experimental results on a retrieval-based task show that these two methods complement each other, yielding an improvement of 12.6% in the area under the precision-recall curve. Lastly, we visualize the acquired motif representations, offering an intuitive comprehension of the overall structure of a music piece. As far as we know, this work marks a noteworthy step forward in computational modeling of music motifs. We believe that this work lays the foundations for future applications of motifs in automatic music composition and music information retrieval.
翻译:音乐动机作为作曲的概念性基石,对于音乐结构分析和自动作曲至关重要。尽管人类听众能轻易识别动机,但现有计算模型在表征动机及其发展方面存在不足。其原因在于动机的本质是隐性的,且动机变体的多样性超越了简单的重复与变调。本研究旨在通过表征学习,利用孪生网络架构及预训练与微调流程,学习动机及其变体之间的隐性关系。预训练阶段采用基于正则化方法的VICReg,微调阶段则应用对比学习。在基于检索的任务中,实验结果表明这两种方法相互补充,使精确率-召回率曲线下的面积提升了12.6%。最后,我们可视化了所获得的动机表征,从而直观理解音乐作品的整体结构。据我们所知,这项工作标志着音乐动机计算建模领域取得了显著进展。我们相信,本研究为动机在自动作曲和音乐信息检索中的未来应用奠定了基础。