This paper presents an unsupervised machine learning algorithm that identifies recurring patterns -- referred to as ``music-words'' -- from symbolic music data. These patterns are fundamental to musical structure and reflect the cognitive processes involved in composition. However, extracting these patterns remains challenging because of the inherent semantic ambiguity in musical interpretation. We formulate the task of music-word discovery as a statistical optimization problem and propose a two-stage Expectation-Maximization (EM)-based learning framework: 1. Developing a music-word dictionary; 2. Reconstructing the music data. When evaluated against human expert annotations, the algorithm achieved an Intersection over Union (IoU) score of 0.61. Our findings indicate that minimizing code length effectively addresses semantic ambiguity, suggesting that human optimization of encoding systems shapes musical semantics. This approach enables computers to extract ``basic building blocks'' from music data, facilitating structural analysis and sparse encoding. The method has two primary applications. First, in AI music, it supports downstream tasks such as music generation, classification, style transfer, and improvisation. Second, in musicology, it provides a tool for analyzing compositional patterns and offers insights into the principle of minimal encoding across diverse musical styles and composers.
翻译:本文提出一种无监督机器学习算法,用于从符号音乐数据中识别重复出现的模式——即“音乐词汇”。这些模式是音乐结构的基础,反映了创作过程中的认知机制。然而,由于音乐解释固有的语义模糊性,提取这些模式仍然具有挑战性。我们将音乐词汇发现任务构建为统计优化问题,并提出一个基于期望最大化(EM)的两阶段学习框架:1. 构建音乐词汇词典;2. 重建音乐数据。在与人类专家标注进行对比评估时,该算法取得了0.61的交并比(IoU)分数。我们的研究结果表明,最小化编码长度能有效解决语义模糊问题,这暗示人类对编码系统的优化塑造了音乐语义。该方法使计算机能够从音乐数据中提取“基本构建单元”,从而促进结构分析和稀疏编码。本方法主要有两方面应用:其一,在人工智能音乐领域,它可支持音乐生成、分类、风格迁移和即兴创作等下游任务;其二,在音乐学领域,它为分析作曲模式提供了工具,并为理解不同音乐风格和作曲家的最小编码原则提供了新视角。