Uniform-state discrete diffusion models excel at few-step generation and guidance due to their ability to self-correct, making them preferred over autoregressive or Masked diffusion models in these settings. However, their sampling quality plateaus with ancestral samplers as the number of steps increases. We introduce a family of Predictor-Corrector (PC) samplers for discrete diffusion that generalize prior methods and apply to arbitrary noise processes. When paired with uniform-state diffusion, our samplers outperform ancestral sampling on both language and image modeling, achieving lower generative perplexity at matched unigram entropy on OpenWebText and better FID/IS scores on CIFAR10. Crucially, unlike conventional samplers, our PC methods continue to improve with more sampling steps. Taken together, these findings call into question the assumption that Masked diffusion is the inevitable future of diffusion-based language modeling. Beyond sampling, we develop a memory-efficient curriculum for the Gaussian relaxation training phase, reducing training time by 25% and memory by 33% compared to Duo while maintaining comparable perplexity on OpenWebText and LM1B and strong downstream performance. We release code, checkpoints, and a video-tutorial on: https://s-sahoo.com/duo-ch2
翻译:均匀状态离散扩散模型因其自校正能力,在少步生成与引导任务中表现出色,使其在这些场景中优于自回归或掩码扩散模型。然而,当采用祖先采样器时,其采样质量会随步数增加而趋于饱和。本文提出了一类适用于离散扩散的预测器-校正器(PC)采样器,该方法推广了现有技术并适用于任意噪声过程。当与均匀状态扩散结合时,我们的采样器在语言建模和图像建模任务上均优于祖先采样:在OpenWebText数据集上实现了相同单字熵条件下更低的生成困惑度,在CIFAR10数据集上获得了更优的FID/IS分数。关键的是,与传统采样器不同,我们的PC方法能随着采样步数增加持续提升性能。这些发现共同对"掩码扩散必然成为基于扩散的语言建模未来方向"的假设提出了质疑。在采样方法之外,我们为高斯松弛训练阶段开发了一种内存高效的课程学习策略,相比Duo方法在保持OpenWebText和LM1B数据集上可比困惑度的同时,训练时间减少25%,内存占用降低33%,且在下游任务中保持强劲性能。我们已发布代码、模型检查点及视频教程:https://s-sahoo.com/duo-ch2