Discrete diffusion has become a leading framework for generative modeling in various applications including language, vision, and biology. Existing convergence theory, however, exhibits fundamental limitations. KL-based analyses diverge under singular priors such as the masked distribution, while bounds in total variation (TV) depend on the state space size $S$ and become vacuous for modern language tasks, where vocabularies contain hundreds of thousands of tokens. We develop a unified adjoint-equation-based framework that establishes dimension-free convergence guarantees in any integral probability metric (IPM). To the best of our knowledge, our bounds are the first to be entirely free of $S$ and applicable to both masked and uniform priors. Importantly, our theory relies only on a single standard rate-matrix regularity assumption and is compatible with time-inhomogeneous schedules. Four novel techniques drive our improvements: working in the space of observables via adjoint equations rather than directly with probability measures, a regularity analysis that yields bounds on any IPM, a coupling argument that removes $S$-dependence under uniform transitions, and a score-marginal cancellation technique that removes $S$-dependence under masked transitions. Our framework thus sharply departs from prior analyses and avoids the shortcomings of pathspace-KL and existing TV-based approaches. Beyond convergence bounds, our framework provides a versatile toolkit for further theoretical study of discrete diffusion models.
翻译:离散扩散已成为语言、视觉和生物学等多种应用中生成建模的主流框架。然而,现有收敛理论存在根本性局限。基于KL散度的分析在奇异先验(如掩码分布)下发散,而全变差(TV)界限依赖于状态空间大小$S$,且在词汇表包含数十万词元的现代语言任务中失去效力。我们开发了统一的伴随方程框架,能在任何积分概率度量(IPM)下建立维数无关的收敛保证。据我们所知,我们的界限是首个完全摆脱$S$依赖、并同时适用于掩码先验和均匀先验的结果。重要的是,该理论仅依赖单一标准速率矩阵正则性假设,且兼容时间非齐次调度。四项创新技术推动了改进:通过伴随方程在可观测量空间而非直接处理概率测度、得出任意IPM界限的正则性分析、消除均匀转移下$S$依赖的耦合论证、以及消除掩码转移下$S$依赖的得分-边际相消技术。因此,本文框架显著区别于以往分析,规避了路径空间KL和现有基于TV方法的缺陷。除收敛界限外,该框架为离散扩散模型的进一步理论研究提供了多功能工具包。