Diffusion models have achieved huge empirical success in data generation tasks. Recently, some efforts have been made to adapt the framework of diffusion models to discrete state space, providing a more natural approach for modeling intrinsically discrete data, such as language and graphs. This is achieved by formulating both the forward noising process and the corresponding reversed process as Continuous Time Markov Chains (CTMCs). In this paper, we investigate the theoretical properties of the discrete diffusion model. Specifically, we introduce an algorithm leveraging the uniformization of continuous Markov chains, implementing transitions on random time points. Under reasonable assumptions on the learning of the discrete score function, we derive Total Variation distance and KL divergence guarantees for sampling from any distribution on a hypercube. Our results align with state-of-the-art achievements for diffusion models in $\mathbb{R}^d$ and further underscore the advantages of discrete diffusion models in comparison to the $\mathbb{R}^d$ setting.
翻译:扩散模型在数据生成任务中已取得了巨大的实证成功。近年来,研究者致力于将扩散模型框架适配到离散状态空间,为建模语言、图等天然离散数据提供了更自然的方法。这通过将前向加噪过程与相应的逆向过程均建模为连续时间马尔可夫链来实现。本文研究了离散扩散模型的理论性质。具体而言,我们引入了一种利用连续马尔可夫链均匀化方法的算法,在随机时间点上实现转移。在对离散评分函数学习提出合理假设的条件下,我们推导了从超立方体上任意分布采样的全变差距离和KL散度保证。我们的结果与$\mathbb{R}^d$空间中扩散模型的最优成果相一致,并进一步凸显了离散扩散模型相较于$\mathbb{R}^d$设置的优势。