We present a systematic theoretical framework that interprets masked diffusion models (MDMs) as solutions to energy minimization problems in discrete optimal transport. Specifically, we prove that three distinct energy formulations--kinetic, conditional kinetic, and geodesic energy--are mathematically equivalent under the structure of MDMs, and that MDMs minimize all three when the mask schedule satisfies a closed-form optimality condition. This unification not only clarifies the theoretical foundations of MDMs, but also motivates practical improvements in sampling. By parameterizing interpolation schedules via Beta distributions, we reduce the schedule design space to a tractable 2D search, enabling efficient post-training tuning without model modification. Experiments on synthetic and real-world benchmarks demonstrate that our energy-inspired schedules outperform hand-crafted baselines, particularly in low-step sampling settings.
翻译:我们提出了一个系统的理论框架,将掩码扩散模型解释为离散最优传输中能量最小化问题的解。具体而言,我们证明了在MDMs的结构下,三种不同的能量表述——动能、条件动能和测地能量——在数学上是等价的,并且当掩码调度满足闭式最优性条件时,MDMs能够最小化所有这三种能量。这一统一不仅澄清了MDMs的理论基础,还推动了采样过程的实际改进。通过使用Beta分布参数化插值调度,我们将调度设计空间简化为一个可处理的二维搜索,从而实现了无需模型修改的高效训练后调优。在合成和真实世界基准测试上的实验表明,我们受能量启发的调度方案优于手工设计的基线方法,尤其在低步数采样设置中表现突出。