Music Information Retrieval (MIR) has seen a recent surge in deep learning-based approaches, which often involve encoding symbolic music (i.e., music represented in terms of discrete note events) in an image-like or language like fashion. However, symbolic music is neither an image nor a sentence, and research in the symbolic domain lacks a comprehensive overview of the different available representations. In this paper, we investigate matrix (piano roll), sequence, and graph representations and their corresponding neural architectures, in combination with symbolic scores and performances on three piece-level classification tasks. We also introduce a novel graph representation for symbolic performances and explore the capability of graph representations in global classification tasks. Our systematic evaluation shows advantages and limitations of each input representation. Our results suggest that the graph representation, as the newest and least explored among the three approaches, exhibits promising performance, while being more light-weight in training.
翻译:音乐信息检索(MIR)领域近期涌现了大量基于深度学习的方法,这些方法通常将符号音乐(即以离散音符事件表示的音乐)编码为类似图像或语言的形式。然而,符号音乐既非图像也非句子,且符号音乐领域的研究缺乏对不同可用表示形式的全面概述。本文研究了矩阵(钢琴卷帘)、序列和图这三种表示形式及其对应的神经架构,并结合符号乐谱与演奏,在三个作品级分类任务上进行了实验。我们还提出了一种新颖的符号演奏图表示方法,并探索了图表示在全局分类任务中的能力。系统性评估揭示了每种输入表示的优势与局限性。实验结果表明,图表示作为三种方法中最新颖且探索最少的方式,在训练过程中更为轻量化的同时,展现出了极具前景的性能表现。