Discrete diffusion models have seen a surge of attention with applications on naturally discrete data such as language and graphs. Although discrete-time discrete diffusion has been established for a while, only recently Campbell et al. (2022) introduced the first framework for continuous-time discrete diffusion. However, their training and sampling processes differ significantly from the discrete-time version, necessitating nontrivial approximations for tractability. In this paper, we first present a series of mathematical simplifications of the variational lower bound that enable more accurate and easy-to-optimize training for discrete diffusion. In addition, we derive a simple formulation for backward denoising that enables exact and accelerated sampling, and importantly, an elegant unification of discrete-time and continuous-time discrete diffusion. Thanks to simpler analytical formulations, both forward and now also backward probabilities can flexibly accommodate any noise distribution, including different noise distributions for multi-element objects. Experiments show that our proposed USD3 (for Unified Simplified Discrete Denoising Diffusion) outperform all SOTA baselines on established datasets. We open-source our unified code at https://github.com/LingxiaoShawn/USD3.
翻译:离散扩散模型在语言和图等自然离散数据上的应用引起了广泛关注。尽管离散时间离散扩散已建立一段时间,但直到最近Campbell等人(2022)才引入了连续时间离散扩散的首个框架。然而,其训练和采样过程与离散时间版本差异显著,需要引入非平凡的近似以保证可操作性。本文首先提出一系列变分下界的数学简化方法,使离散扩散的训练更精确且易于优化。此外,我们推导出后向去噪的简洁公式,实现了精确且加速的采样,更重要的是,优雅地统一了离散时间与连续时间离散扩散。得益于更简洁的解析形式,前向和后向概率均可灵活适应任意噪声分布,包括多元素对象的不同噪声分布。实验表明,我们提出的USD3(统一简化离散去噪扩散)在已有数据集上超越了所有最优基线方法。我们在https://github.com/LingxiaoShawn/USD3开源了统一代码。