Music Information Retrieval (MIR) has seen a recent surge in deep learning-based approaches, which often involve encoding symbolic music (i.e., music represented in terms of discrete note events) in an image-like or language like fashion. However, symbolic music is neither an image nor a sentence, and research in the symbolic domain lacks a comprehensive overview of the different available representations. In this paper, we investigate matrix (piano roll), sequence, and graph representations and their corresponding neural architectures, in combination with symbolic scores and performances on three piece-level classification tasks. We also introduce a novel graph representation for symbolic performances and explore the capability of graph representations in global classification tasks. Our systematic evaluation shows advantages and limitations of each input representation. Our results suggest that the graph representation, as the newest and least explored among the three approaches, exhibits promising performance, while being more light-weight in training.
翻译:音乐信息检索(MIR)近期涌现了大量基于深度学习的方法,这些方法通常将符号音乐(即以离散音符事件表示的音乐)编码为类图像或类语言的形式。然而,符号音乐既非图像也非句子,且符号音乐领域的研究缺乏对不同可用表示方法的全面概览。本文研究了矩阵(钢琴卷帘)、序列和图这三种表示方法及其对应的神经架构,并结合符号乐谱和演奏数据,在三项乐段级分类任务上进行实验。我们还提出了一种面向符号演奏的新颖图表示方法,并探索了图表示在全局分类任务中的能力。系统性评估揭示了每种输入表示的优势与局限。结果表明,图表示作为三种方法中最新且探索最少的一种,在训练更轻量化的同时展现出富有前景的性能。