Graphs can be leveraged to model polyphonic multitrack symbolic music, where notes, chords and entire sections may be linked at different levels of the musical hierarchy by tonal and rhythmic relationships. Nonetheless, there is a lack of works that consider graph representations in the context of deep learning systems for music generation. This paper bridges this gap by introducing a novel graph representation for music and a deep Variational Autoencoder that generates the structure and the content of musical graphs separately, one after the other, with a hierarchical architecture that matches the structural priors of music. By separating the structure and content of musical graphs, it is possible to condition generation by specifying which instruments are played at certain times. This opens the door to a new form of human-computer interaction in the context of music co-creation. After training the model on existing MIDI datasets, the experiments show that the model is able to generate appealing short and long musical sequences and to realistically interpolate between them, producing music that is tonally and rhythmically consistent. Finally, the visualization of the embeddings shows that the model is able to organize its latent space in accordance with known musical concepts.
翻译:图结构可用于建模多声部多轨道符号音乐,其中音符、和弦及整段音乐结构可通过调性关系与节奏关系在不同层级上建立关联。然而,在基于深度学习的音乐生成系统中,考虑图表示的研究尚属空白。本文通过引入一种新型音乐图表示法及深度变分自编码器来弥合这一研究缺口——该模型采用与音乐结构先验相匹配的分层架构,依次分别生成音乐图的结构与内容。通过分离音乐图的结构与内容,可依据特定时刻演奏的乐器对生成过程进行条件控制,这为音乐协同创作中的人机交互开辟了新范式。在现有MIDI数据集上完成训练后,实验表明该模型既能生成富有吸引力的短序列与长序列音乐,也能在两者之间实现逼真的插值过渡,生成的作品在调性与节奏上保持一致性。最后,嵌入向量的可视化显示,该模型能够依据已知音乐概念有效组织其潜在空间。