Denoising Diffusion Probabilistic Models (DDPMs) have made great strides in generating high-quality samples in both discrete and continuous domains. However, Discrete DDPMs (D3PMs) have yet to be applied to the domain of Symbolic Music. This work presents the direct generation of Polyphonic Symbolic Music using D3PMs. Our model exhibits state-of-the-art sample quality, according to current quantitative evaluation metrics, and allows for flexible infilling at the note level. We further show, that our models are accessible to post-hoc classifier guidance, widening the scope of possible applications. However, we also cast a critical view on quantitative evaluation of music sample quality via statistical metrics, and present a simple algorithm that can confound our metrics with completely spurious, non-musical samples.
翻译:去噪扩散概率模型在离散和连续领域生成高质量样本方面取得了显著进展。然而,离散扩散概率模型尚未被应用于符号音乐领域。本研究提出了使用离散扩散概率模型直接生成复调符号音乐的方法。根据当前定量评估指标,我们的模型展现出最先进的样本质量,并支持在音符级别进行灵活的填充。我们进一步证明,该模型能够兼容事后分类器引导,从而拓宽了潜在应用范围。然而,本文也对通过统计指标定量评估音乐样本质量的方法提出了批判性观点,并展示了一种简单算法,该算法能够用完全虚假的非音乐样本迷惑这些评估指标。