We present a new approach for fast and controllable generation of symbolic music based on the simplex diffusion, which is essentially a diffusion process operating on probabilities rather than the signal space. This objective has been applied in domains such as natural language processing but here we apply it to generating 4-bar multi-instrument music loops using an orderless representation. We show that our model can be steered with vocabulary priors, which affords a considerable level control over the music generation process, for instance, infilling in time and pitch and choice of instrumentation -- all without task-specific model adaptation or applying extrinsic control.
翻译:我们提出了一种基于单纯形扩散的符号音乐快速可控生成新方法,该方法本质上是在概率空间而非信号空间上进行扩散过程。该目标函数已在自然语言处理等领域得到应用,但本文将其应用于基于无序表示的四小节多乐器音乐循环生成。研究表明,我们的模型可通过词汇先验进行引导,从而实现对音乐生成过程的高度控制,例如时间与音高的填充以及乐器选择——所有这些均无需针对特定任务调整模型或施加外部控制。