Recent advancements in generative models have shown remarkable progress in music generation. However, most existing methods focus on generating monophonic or homophonic music, while the generation of polyphonic and multi-track music with rich attributes is still a challenging task. In this paper, we propose a novel approach for multi-track, multi-attribute symphonic music generation using the diffusion model. Specifically, we generate piano-roll representations with a diffusion model and map them to MIDI format for output. To capture rich attribute information, we introduce a color coding scheme to encode note sequences into color and position information that represents pitch,velocity, and instrument. This scheme enables a seamless mapping between discrete music sequences and continuous images. We also propose a post-processing method to optimize the generated scores for better performance. Experimental results show that our method outperforms state-of-the-art methods in terms of polyphonic music generation with rich attribute information compared to the figure methods.
翻译:近年来,生成模型的进展在音乐生成领域取得了显著成果。然而,现有方法大多集中于生成单音或同音音乐,而生成具有丰富属性的复调及多轨音乐仍是一项挑战性任务。本文提出了一种基于扩散模型的多轨、多属性交响乐生成新方法。具体而言,我们利用扩散模型生成钢琴卷帘表示,并将其映射为MIDI格式进行输出。为捕获丰富的属性信息,我们引入了一种颜色编码方案,将音符序列编码为表示音高、力度和乐器的颜色与位置信息。该方案实现了离散音乐序列与连续图像之间的无缝映射。此外,我们还提出了一种后处理方法以优化生成乐谱,从而提升性能。实验结果表明,与基准方法相比,本方法在生成具有丰富属性信息的复调音乐方面优于现有最优技术。