Efficient Sampling with Discrete Diffusion Models: Sharp and Adaptive Guarantees

Diffusion models over discrete spaces have recently shown striking empirical success, yet their theoretical foundations remain incomplete. In this paper, we study the sampling efficiency of score-based discrete diffusion models under a continuous-time Markov chain (CTMC) formulation, with a focus on $τ$-leaping-based samplers. We establish sharp convergence guarantees for attaining $\varepsilon$ accuracy in Kullback-Leibler (KL) divergence for both uniform and masking noising processes. For uniform discrete diffusion, we show that the $τ$-leaping algorithm achieves an iteration complexity of order $\tilde O(d/\varepsilon)$, with $d$ the ambient dimension of the target distribution, eliminating linear dependence on the vocabulary size $S$ and improving existing bounds by a factor of $d$; moreover, we establish a matching algorithmic lower bound showing that linear dependence on the ambient dimension is unavoidable in general. For masking discrete diffusion, we introduce a modified $τ$-leaping sampler whose convergence rate is governed by an intrinsic information-theoretic quantity, termed the effective total correlation, which is bounded by $d \log S$ but can be sublinear or even constant for structured data. As a consequence, the sampler provably adapts to low-dimensional structure without prior knowledge or algorithmic modification, yielding sublinear convergence rates for various practical examples (such as hidden Markov models, image data, and random graphs). Our analysis requires no boundedness or smoothness assumptions on the score estimator beyond control of the score entropy loss.

翻译：离散空间上的扩散模型近期展现出显著的实证成功，但其理论基础仍不完整。本文在连续时间马尔可夫链（CTMC）框架下研究基于分数的离散扩散模型的采样效率，重点关注基于$τ$跳跃的采样器。我们针对均匀噪声化过程与掩码噪声化过程，为达到Kullback-Leibler（KL）散度$\varepsilon$精度建立了锐利的收敛性保证。对于均匀离散扩散，我们证明$τ$跳跃算法实现了$\tilde O(d/\varepsilon)$阶的迭代复杂度，其中$d$为目标分布的维度，消除了对词汇表大小$S$的线性依赖，并将现有界限改进了$d$倍；此外，我们建立了一个匹配的算法下界，表明对维度的线性依赖在一般情况下是不可避免的。对于掩码离散扩散，我们引入了一种改进的$τ$跳跃采样器，其收敛速率由一个内在的信息论量——有效总相关——所主导，该量以$d \log S$为上界，但对于结构化数据可以是次线性甚至常数的。因此，该采样器无需先验知识或算法修改即可自适应于低维结构，为多种实际示例（如隐马尔可夫模型、图像数据和随机图）产生次线性收敛速率。我们的分析除对分数熵损失的控制外，无需对分数估计器施加有界性或光滑性假设。