The Diffusion Duality, Chapter II: $Ψ$-Samplers and Efficient Curriculum

Uniform-state discrete diffusion models excel at few-step generation and guidance due to their ability to self-correct, making them preferred over autoregressive or Masked diffusion models in these settings. However, their sampling quality plateaus with ancestral samplers as the number of steps increases. We introduce a family of Predictor-Corrector (PC) samplers for discrete diffusion that generalize prior methods and apply to arbitrary noise processes. When paired with uniform-state diffusion, our samplers outperform ancestral sampling on both language and image modeling, achieving lower generative perplexity at matched unigram entropy on OpenWebText and better FID/IS scores on CIFAR10. Crucially, unlike conventional samplers, our PC methods continue to improve with more sampling steps. Taken together, these findings call into question the assumption that Masked diffusion is the inevitable future of diffusion-based language modeling. Beyond sampling, we develop a memory-efficient curriculum for the Gaussian relaxation training phase, reducing training time by 25% and memory by 33% compared to Duo while maintaining comparable perplexity on OpenWebText and LM1B and strong downstream performance. We release code, checkpoints, and a video-tutorial on: https://s-sahoo.com/duo-ch2

翻译：均匀状态离散扩散模型因其自校正能力，在少步生成与引导任务中表现出色，使其在这些场景中优于自回归或掩码扩散模型。然而，当采用祖先采样器时，其采样质量会随步数增加而趋于饱和。本文提出了一类适用于离散扩散的预测器-校正器（PC）采样器，该方法推广了现有技术并适用于任意噪声过程。当与均匀状态扩散结合时，我们的采样器在语言建模和图像建模任务上均优于祖先采样：在OpenWebText数据集上实现了相同单字熵条件下更低的生成困惑度，在CIFAR10数据集上获得了更优的FID/IS分数。关键的是，与传统采样器不同，我们的PC方法能随着采样步数增加持续提升性能。这些发现共同对"掩码扩散必然成为基于扩散的语言建模未来方向"的假设提出了质疑。在采样方法之外，我们为高斯松弛训练阶段开发了一种内存高效的课程学习策略，相比Duo方法在保持OpenWebText和LM1B数据集上可比困惑度的同时，训练时间减少25%，内存占用降低33%，且在下游任务中保持强劲性能。我们已发布代码、模型检查点及视频教程：https://s-sahoo.com/duo-ch2

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

144页ppt《扩散模型》，Google DeepMind Sander Dieleman

专知会员服务

51+阅读 · 2025年11月21日