Discrete diffusion models have emerged as a powerful paradigm for generative modeling on sequence data; however, the information-theoretic principles governing their reverse processes remain significantly less understood than those of their continuous counterparts. In this work, we bridge this gap by analyzing the reverse process dynamics through the lens of thermodynamic entropy production. We propose the entropy production rate as a rigorous proxy for quantifying information generation, deriving as a byproduct a bound on the Wasserstein distance between intermediate states and the data distribution. Leveraging these insights, we introduce two novel sampling schedules that are uniformly spaced with respect to their corresponding physics-inspired metrics: the Entropic Discrete Schedule (EDS), which is defined by maintaining a constant rate of information gain, and the Wasserstein Discrete Schedule (WDS), which is defined by taking equal steps in terms of the Wasserstein distance. We empirically demonstrate that our proposed schedules significantly outperform state-of-the-art strategies across diverse application domains, including synthetic data, music notation, vision and language modeling, consistently achieving superior performance at a lower computational budget.
翻译:离散扩散模型已成为序列数据生成建模的强大范式;然而,其逆向过程所遵循的信息论原理相较于连续对应模型仍显著缺乏深入理解。本研究通过热力学熵产生的视角分析逆向过程动力学,从而弥合了这一认知差距。我们提出以熵产生率作为量化信息生成的严格代理指标,并由此推导出中间状态与数据分布之间Wasserstein距离的一个上界。基于这些理论洞见,我们引入了两种在相应物理启发的度量下均匀间隔的新型采样调度策略:熵离散调度(EDS)——通过保持恒定信息增益率来定义,以及Wasserstein离散调度(WDS)——通过在Wasserstein距离上采取等步长来定义。我们通过实证研究表明,在包括合成数据、音乐符号、视觉与语言建模在内的多样化应用领域中,我们提出的调度策略显著优于现有最优方法,且能以更低计算成本持续实现更优性能。