In addressing the challenge of interpretability and generalizability of artificial music intelligence, this paper introduces a novel symbolic representation that amalgamates both explicit and implicit musical information across diverse traditions and granularities. Utilizing a hierarchical and-or graph representation, the model employs nodes and edges to encapsulate a broad spectrum of musical elements, including structures, textures, rhythms, and harmonies. This hierarchical approach expands the representability across various scales of music. This representation serves as the foundation for an energy-based model, uniquely tailored to learn musical concepts through a flexible algorithm framework relying on the minimax entropy principle. Utilizing an adapted Metropolis-Hastings sampling technique, the model enables fine-grained control over music generation. A comprehensive empirical evaluation, contrasting this novel approach with existing methodologies, manifests considerable advancements in interpretability and controllability. This study marks a substantial contribution to the fields of music analysis, composition, and computational musicology.
翻译:针对人工智能音乐的可解释性与泛化性挑战,本文提出一种融合多传统与多粒度显式及隐式音乐信息的新型符号表示方法。该模型采用层次化与或图表示,通过节点与边封装包括结构、织体、节奏与和声在内的广泛音乐元素,其层次化架构扩展了多尺度音乐的可表示性。该表示作为能量模型的基础,基于极小极大熵原理通过灵活算法框架独特地学习音乐概念。利用改进的Metropolis-Hastings采样技术,模型实现了对音乐生成的细粒度控制。通过与该领域现有方法进行全面实证对比,本研究在可解释性与可控性方面展现出显著进步,为音乐分析、作曲及计算音乐学领域作出重要贡献。