GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework

Symbolic music generation aims to create musical notes, which can help users compose music, such as generating target instrumental tracks from scratch, or based on user-provided source tracks. Considering the diverse and flexible combination between source and target tracks, a unified model capable of generating any arbitrary tracks is of crucial necessity. Previous works fail to address this need due to inherent constraints in music representations and model architectures. To address this need, we propose a unified representation and diffusion framework named GETMusic (`GET' stands for GEnerate music Tracks), which includes a novel music representation named GETScore, and a diffusion model named GETDiff. GETScore represents notes as tokens and organizes them in a 2D structure, with tracks stacked vertically and progressing horizontally over time. During training, tracks are randomly selected as either the target or source. In the forward process, target tracks are corrupted by masking their tokens, while source tracks remain as ground truth. In the denoising process, GETDiff learns to predict the masked target tokens, conditioning on the source tracks. With separate tracks in GETScore and the non-autoregressive behavior of the model, GETMusic can explicitly control the generation of any target tracks from scratch or conditioning on source tracks. We conduct experiments on music generation involving six instrumental tracks, resulting in a total of 665 combinations. GETMusic provides high-quality results across diverse combinations and surpasses prior works proposed for some specific combinations.

翻译：符号音乐生成旨在创作音符，可辅助用户进行音乐创作，例如从头生成目标乐器轨道，或基于用户提供的源轨道生成音乐。考虑到源轨道与目标轨道之间多样且灵活的组合方式，能够生成任意轨道的统一模型具有至关重要的必要性。先前的工作因音乐表示和模型架构的内在限制而未能满足这一需求。为此，我们提出了一种名为GETMusic（'GET'代表生成音乐轨道）的统一表示与扩散框架，该框架包含一种新型音乐表示GETScore和一种扩散模型GETDiff。GETScore将音符表示为令牌，并以二维结构组织这些令牌，其中轨道沿垂直方向堆叠，并随时间水平推进。在训练过程中，轨道随机被选为目标轨道或源轨道。在前向过程中，目标轨道通过掩蔽其令牌而被破坏，而源轨道则保持为真实值。在去噪过程中，GETDiff以源轨道为条件，学习预测被掩蔽的目标令牌。借助GETScore中独立的轨道以及模型的非自回归特性，GETMusic能够明确控制从头生成任意目标轨道，或以源轨道为条件进行生成。我们针对包含六个乐器轨道的音乐生成进行了实验，总共涉及665种组合。GETMusic在多种组合下均能提供高质量结果，并超越了先前针对某些特定组合提出的方法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日