Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling

Masked diffusion models (MDMs) have emerged as a popular research topic for generative modeling of discrete data, thanks to their superior performance over other discrete diffusion models, and are rivaling the auto-regressive models (ARMs) for language modeling tasks. The recent effort in simplifying the masked diffusion framework further leads to alignment with continuous-space diffusion models and more principled training and sampling recipes. In this paper, however, we reveal that both training and sampling of MDMs are theoretically free from the time variable, arguably the key signature of diffusion models, and are instead equivalent to masked models. The connection on the sampling aspect is drawn by our proposed first-hitting sampler (FHS). Specifically, we show that the FHS is theoretically equivalent to MDMs' original generation process while significantly alleviating the time-consuming categorical sampling and achieving a 20$\times$ speedup. In addition, our investigation raises doubts about whether MDMs can truly beat ARMs in text generation. We identify, for the first time, an underlying numerical issue, even with the commonly used 32-bit floating-point precision, which results in inaccurate categorical sampling. We show that it lowers the effective temperature both theoretically and empirically, and the resulting decrease in token diversity makes previous evaluations, which assess the generation quality solely through the incomplete generative perplexity metric, somewhat unfair.

翻译：掩码扩散模型已成为离散数据生成建模的热门研究方向，其性能优于其他离散扩散模型，并在语言建模任务中可与自回归模型相媲美。近期简化掩码扩散框架的努力进一步实现了与连续空间扩散模型的对齐，并形成了更规范化的训练与采样方案。然而，本文揭示掩码扩散模型的训练与采样过程在理论上均独立于时间变量——这一扩散模型的关键特征——实际上等价于掩码模型。采样层面的关联性通过我们提出的首达采样器得以建立。具体而言，我们证明首达采样器在理论上等价于掩码扩散模型的原始生成过程，同时显著缓解了耗时的分类采样问题，实现了20倍的加速效果。此外，我们的研究对掩码扩散模型是否真正能在文本生成任务中超越自回归模型提出了质疑。我们首次发现，即使在常用的32位浮点精度下，仍存在潜在数值问题导致分类采样不准确。理论分析与实验结果表明，该问题会降低有效温度，由此引发的词汇多样性下降使得先前仅通过不完整的生成困惑度指标评估生成质量的方法存在一定偏颇。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/