At present, neural network-based models, including transformers, struggle to generate memorable and readily comprehensible music from unified and repetitive musical material due to a lack of understanding of musical structure. Consequently, these models are rarely employed by the games industry. It is hypothesised by many scholars that the modelling of musical structure may inform models at a higher level, thereby enhancing the quality of music generation. The aim of this study is to explore the performance of supervised learning methods in the task of structural segmentation, which is the initial step in music structure modelling. An audio game music dataset with 309 structural annotations was created to train the proposed method, which combines convolutional neural networks and recurrent neural networks, achieving performance comparable to the state-of-the-art unsupervised learning methods with fewer training resources.
翻译:当前,包括Transformer在内的神经网络模型由于缺乏对音乐结构的理解,难以从统一且重复的音乐素材中生成令人难忘且易于理解的音乐。因此,游戏行业很少采用这些模型。许多学者假设,音乐结构的建模可以在更高层次上指导模型,从而提高音乐生成的质量。本研究旨在探索监督学习方法在结构分割任务中的表现,这是音乐结构建模的第一步。我们创建了一个包含309个结构标注的音频游戏音乐数据集,用于训练所提出的结合卷积神经网络和循环神经网络的方法,该方法以较少的训练资源实现了与最先进的无监督学习方法相当的性能。