Simple and Effective Masked Diffusion Language Models

While diffusion models excel at generating high-quality images, prior work reports a significant performance gap between diffusion and autoregressive (AR) methods in language modeling. In this work, we show that simple masked discrete diffusion is more performant than previously thought. We apply an effective training recipe that improves the performance of masked diffusion models and derive a simplified, Rao-Blackwellized objective that results in additional improvements. Our objective has a simple form -- it is a mixture of classical masked language modeling losses -- and can be used to train encoder-only language models that admit efficient samplers, including ones that can generate arbitrary lengths of text semi-autoregressively like a traditional language model. On language modeling benchmarks, a range of masked diffusion models trained with modern engineering practices achieves a new state-of-the-art among diffusion models, and approaches AR perplexity. We provide the code, along with a blog post and video tutorial on the project page: https://s-sahoo.com/mdlm

翻译：尽管扩散模型在生成高质量图像方面表现出色，但先前的研究报告指出，在语言建模任务中，扩散模型与自回归方法之间存在显著的性能差距。本研究表明，简单的掩码离散扩散模型比先前认为的更具性能潜力。我们应用了一种有效的训练方案，提升了掩码扩散模型的性能，并推导出一种简化的Rao-Blackwellized目标函数，进一步带来了性能改进。我们的目标函数形式简洁——它是经典掩码语言建模损失的混合体——可用于训练仅包含编码器的语言模型，这些模型支持高效的采样器，包括能够像传统语言模型那样以半自回归方式生成任意长度文本的采样器。在语言建模基准测试中，一系列采用现代工程实践训练的掩码扩散模型在扩散模型中达到了新的最优性能，并接近自回归模型的困惑度。我们在项目页面（https://s-sahoo.com/mdlm）上提供了代码、博客文章和视频教程。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/