AI-based music generation has progressed significantly in recent years. However, creating symbolic music that is both long-structured and expressive remains a considerable challenge. In this paper, we propose PerceiverS (Segmentation and Scale), a novel architecture designed to address this issue by leveraging both Effective Segmentation and Multi-Scale attention mechanisms. Our approach enhances symbolic music generation by simultaneously learning long-term structural dependencies and short-term expressive details. By combining cross-attention and self-attention in a Multi-Scale setting, PerceiverS captures long-range musical structure while preserving musical diversity. The proposed model has been evaluated using the Maestro dataset and has demonstrated improvements in generating music of conventional length with expressive nuances. The project demos and the generated music samples can be accessed through the link: https://perceivers.github.io
翻译:近年来,基于人工智能的音乐生成技术取得了显著进展。然而,创作既具有长时结构又富有表现力的符号音乐仍然是一个相当大的挑战。本文提出PerceiverS(Segmentation and Scale),这是一种新颖的架构,旨在通过利用有效分割和多尺度注意力机制来解决这一问题。我们的方法通过同时学习长期结构依赖关系和短期表现细节来增强符号音乐生成。通过在多尺度设置中结合交叉注意力和自注意力,PerceiverS能够捕捉长程音乐结构,同时保持音乐的多样性。所提出的模型已使用Maestro数据集进行评估,并在生成具有表现力细微差别的常规长度音乐方面显示出改进。项目演示和生成的音乐样本可通过链接访问:https://perceivers.github.io