Deep learning models have become a critical tool for analysis and classification of musical data. These models operate either on the audio signal, e.g. waveform or spectrogram, or on a symbolic representation, such as MIDI. In the latter, musical information is often reduced to basic features, i.e. durations, pitches and velocities. Most existing works then rely on generic tokenization strategies from classical natural language processing, or matrix representations, e.g. piano roll. In this work, we evaluate how enriched representations of symbolic data can impact deep models, i.e. Transformers and RNN, for music style classification. In particular, we examine representations that explicitly incorporate musical information implicitly present in MIDI-like encodings, such as rhythmic organization, and show that they outperform generic tokenization strategies. We introduce a new tree-based representation of MIDI data built upon a context-free musical grammar. We show that this grammar representation accurately encodes high-level rhythmic information and outperforms existing encodings on the GrooveMIDI Dataset for drumming style classification, while being more compact and parameter-efficient.
翻译:深度学习模型已成为音乐数据分析与分类的关键工具。这些模型可对音频信号(如波形或频谱图)或符号表示(如MIDI)进行处理。在符号表示中,音乐信息通常被简化为基本特征,即时长、音高和力度。现有研究大多依赖经典自然语言处理的通用标记化策略或矩阵表示(如钢琴卷帘)。本研究评估了符号数据的增强表示如何影响深度模型(如Transformer和RNN)在音乐风格分类中的表现。特别地,我们研究了显式包含MIDI类编码中隐含音乐信息(如节奏组织)的表示方法,并证明其性能优于通用标记化策略。我们提出了一种基于上下文无关音乐语法构建的新型树状MIDI数据表示。实验表明,该语法表示能精确编码高层次节奏信息,在GrooveMIDI数据集上的鼓风格分类任务中优于现有编码方法,同时具有更高的紧凑性和参数效率。