Masked diffusion models (MDMs) generate text by iteratively selecting positions to unmask and then predicting tokens at those positions. Yet MDMs lack proper likelihood evaluation: the evidence lower bound (ELBO) is not only a loose bound on log-likelihood, but, as we show, is also computed under the training distribution rather than the test-time distribution. We resolve this within our DUEL framework, which unifies leading MDM sampling strategies that employ $\textit{deterministic}$ position selection. We prove that DUEL samplers admit $\textbf{exact likelihood computation under the test-time distribution}$ -- giving MDMs $\textit{proper}$ likelihood, and hence proper perplexity, for the first time. This proper perplexity is the natural analogue of autoregressive perplexity and lets us revisit key questions about MDMs. $\textbf{MDMs are substantially better than previously thought}$: the MDM-autoregressive perplexity gap shrinks by up to $32\%$ on in-domain data and $82\%$ on zero-shot benchmarks. DUEL enables the first principled comparison of fast,parallel samplers across compute budgets -- an analysis impossible with the ELBO and unreliable with generative perplexity -- identifying a strong default method. Finally, oracle search over position orderings reveals MDMs can far surpass autoregressive models -- achieving $36.47$ vs. $52.11$ perplexity on AG News -- demonstrating the ceiling of MDM performance has not yet been reached.
翻译:掩码扩散模型(MDMs)通过迭代选择位置进行去掩码并预测对应位置的词元来生成文本。然而,MDMs缺乏适当的似然评估:证据下界(ELBO)不仅是对数似然的一个宽松界,而且如我们所示,其计算基于训练分布而非测试时分布。我们在DUEL框架中解决了这一问题,该框架统一了采用$\textit{确定性}$位置选择的主流MDM采样策略。我们证明DUEL采样器允许$\textbf{在测试时分布下进行精确似然计算}$——首次为MDMs提供了$\textit{严格意义上的}$似然,从而得到严格的困惑度。这种严格困惑度是自回归模型困惑度的自然类比,使我们能重新审视关于MDMs的关键问题。$\textbf{MDMs的实际性能远优于既往认知}$:在领域内数据上,MDM与自回归模型的困惑度差距缩小了高达$32\%$,在零样本基准测试中缩小了$82\%$。DUEL首次实现了跨计算预算的快速并行采样器的原理性比较——这种分析无法通过ELBO实现,且基于生成困惑度的评估不可靠——并确定了一种强力的默认方法。最后,通过位置排序的启发式搜索揭示MDMs能够大幅超越自回归模型——在AG News数据集上达到$36.47$对比$52.11$的困惑度——表明MDM的性能上限尚未达到。