Masked diffusion models (MDMs) are a potential alternative to autoregressive models (ARMs) for language generation, but generation quality depends critically on the generation order. Prior work either hard-codes an ordering (e.g., blockwise left-to-right) or learns an ordering policy for a pretrained MDM, which incurs extra cost and can yield suboptimal solutions due to the two-stage optimization. Motivated by this, we propose order-expressive masked diffusion model (OeMDM) for a broad class of diffusion generative processes with various generation orders, enabling the interpretation of MDM, ARM, and block diffusion in a single framework. Furthermore, building on OeMDM, we introduce learnable-order masked diffusion model (LoMDM), which jointly learns the generation ordering and diffusion backbone through a single objective from scratch, enabling the diffusion model to generate text in context-dependent ordering. Empirically, we confirm that LoMDM outperforms various discrete diffusion models across multiple language modeling benchmarks.
翻译:掩码扩散模型(MDMs)是自回归模型(ARMs)在语言生成领域的一种潜在替代方案,但其生成质量严重依赖于生成顺序。先前的研究要么硬编码一种顺序(例如,块状从左到右),要么为预训练的MDM学习一个顺序策略,这会产生额外成本,并且由于两阶段优化可能导致次优解。受此启发,我们提出了顺序可表达的掩码扩散模型(OeMDM),适用于具有多种生成顺序的广泛扩散生成过程,从而能够在单一框架内解释MDM、ARM和块扩散。此外,基于OeMDM,我们引入了可学习顺序的掩码扩散模型(LoMDM),它通过单一目标从头开始联合学习生成顺序和扩散主干网络,使扩散模型能够根据上下文相关的顺序生成文本。实证结果表明,LoMDM在多个语言建模基准测试中优于各种离散扩散模型。