Diffusion models for continuous state spaces based on Gaussian noising processes are now relatively well understood from both practical and theoretical perspectives. In contrast, results for diffusion models on discrete state spaces remain far less explored and pose significant challenges, particularly due to their combinatorial structure and their more recent introduction in generative modelling. In this work, we establish new and sharp convergence guarantees for three popular discrete diffusion models (DDMs). Two of these models are designed for finite state spaces and are based respectively on the random walk and the masking process. The third DDM we consider is defined on the countably infinite space $\mathbb{N}^d$ and uses a drifted random walk as its forward process. For each of these models, the backward process can be characterized by a discrete score function that can, in principle, be estimated. However, even with perfect access to these scores, simulating the exact backward process is infeasible, and one must rely on time discretization. In this work, we study Euler-type approximations and establish convergence bounds in both Kullback-Leibler divergence and total variation distance for the resulting models, under minimal assumptions on the data distribution. To the best of our knowledge, this study provides the optimal non-asymptotic convergence guarantees for these noising processes that do not rely on boundedness assumptions on the estimated score. In particular, the computational complexity of each method scales only linearly in the dimension, up to logarithmic factors.
翻译:基于高斯噪声过程的连续状态空间扩散模型在理论与实践层面均已得到相对完善的理解。相比之下,离散状态空间扩散模型的结果仍远未充分探索,且由于其组合结构以及在生成建模中较晚引入的特性,带来了显著挑战。本研究为三种主流的离散扩散模型建立了新的且尖锐的收敛保证。其中两种模型针对有限状态空间设计,分别基于随机游走和掩蔽过程。我们考虑的第三种离散扩散模型定义在可数无限空间$\mathbb{N}^d$上,并使用带漂移的随机游走作为其正向过程。对于每个模型,反向过程可由一个离散分数函数表征,该函数原则上可被估计。然而,即使能够完美获取这些分数值,精确模拟反向过程仍然不可行,必须依赖时间离散化。本文研究了欧拉型近似方法,并在数据分布的最小假设下,为所得模型建立了库尔贝克-莱布勒散度和全变差距离两个指标下的收敛界。据我们所知,本研究为这些噪声过程提供了不需要对估计分数施加有界假设的最优非渐近收敛保证。特别地,每种方法的计算复杂度仅随维度线性增长(对数因子除外)。