We study the problem of symbolic music generation (e.g., generating piano rolls), with a technical focus on non-differentiable rule guidance. Musical rules are often expressed in symbolic form on note characteristics, such as note density or chord progression, many of which are non-differentiable which pose a challenge when using them for guided diffusion. We propose Stochastic Control Guidance (SCG), a novel guidance method that only requires forward evaluation of rule functions that can work with pre-trained diffusion models in a plug-and-play way, thus achieving training-free guidance for non-differentiable rules for the first time. Additionally, we introduce a latent diffusion architecture for symbolic music generation with high time resolution, which can be composed with SCG in a plug-and-play fashion. Compared to standard strong baselines in symbolic music generation, this framework demonstrates marked advancements in music quality and rule-based controllability, outperforming current state-of-the-art generators in a variety of settings. For detailed demonstrations, code and model checkpoints, please visit our project website: https://scg-rule-guided-music.github.io/.
翻译:我们研究符号音乐生成问题(如生成钢琴卷帘),技术重点聚焦于非可微规则引导。音乐规则常以符号形式表达音符特征(如音符密度或和弦进行),其中许多规则具有非可微性,这给将其用于引导扩散模型带来了挑战。我们提出随机控制引导(Stochastic Control Guidance, SCG),一种仅需对规则函数进行前向评估的新型引导方法,可与预训练扩散模型以即插即用方式协同工作,从而首次实现非可微规则的无训练引导。此外,我们引入了一种具有高时间分辨率的符号音乐生成潜扩散架构,该架构可与SCG以即插即用方式组合。与符号音乐生成中标准的强基线方法相比,本框架在音乐质量和基于规则的可控性方面展现出显著进步,在多种设置下均优于当前最先进的生成器。详细演示、代码和模型检查点请访问项目网站:https://scg-rule-guided-music.github.io/。