We present an information-theoretic framework for discrete diffusion models that yields principled estimators of log-likelihood using score-matching losses. Inspired by the I-MMSE identity for the Gaussian setup, we derive analogous results for the discrete setting. Specifically, we introduce the Information-Minimum Denoising Score Entropy (I-MDSE) relation, which links mutual information between data and its diffused version to the minimum denoising score entropy (DSE) loss. We extend this theory to masked diffusion and establish the Information-Minimum Denoising Cross-Entropy (I-MDCE) relation, connecting cross-entropy losses to mutual information in discrete masked processes. These results provide a time-integral decomposition of the log-likelihood of the data in terms of optimal score-based losses, showing that commonly used losses such as DSE and DCE are not merely variational bounds but tight and principled estimators of log-likelihood. The I-MDCE decomposition further enables practical extensions, including time-free formula, conditional likelihood estimation in prompt-response tasks, and coupled Monte Carlo estimation of likelihood ratios. Experiments on synthetic and real-world data confirm the accuracy, variance stability, and utility of our estimators. The code is publicly available at https://github.com/Dongjae0324/infodis.
翻译:我们提出了一种用于离散扩散模型的信息论框架,该框架利用分数匹配损失函数构建了对数似然的原则性估计器。受高斯设定下I-MMSE恒等式的启发,我们推导了离散情形下的类似结果。具体而言,我们引入了信息-最小去噪分数熵(I-MDSE)关系,该关系将数据与其扩散版本之间的互信息与最小去噪分数熵(DSE)损失联系起来。我们将该理论扩展至掩码扩散过程,建立了信息-最小去噪交叉熵(I-MDCE)关系,将交叉熵损失与离散掩码过程中的互信息相关联。这些结果为数据的对数似然提供了基于最优分数损失的时间积分分解,表明常用的损失函数(如DSE和DCE)不仅是变分下界,更是对数似然的紧致且原则性的估计器。I-MDCE分解进一步支持了实际扩展应用,包括时间无关公式、提示-响应任务中的条件似然估计,以及似然比的耦合蒙特卡洛估计。在合成数据与真实数据上的实验验证了我们估计器的准确性、方差稳定性及实用性。代码已公开于https://github.com/Dongjae0324/infodis。