Diffusion models that are based on iterative denoising have been recently proposed and leveraged in various generation tasks like image generation. Whereas, as a way inherently built for continuous data, existing diffusion models still have some limitations in modeling discrete data, e.g., languages. For example, the generally used Gaussian noise can not handle the discrete corruption well, and the objectives in continuous spaces fail to be stable for textual data in the diffusion process especially when the dimension is high. To alleviate these issues, we introduce a novel diffusion model for language modeling, Masked-Diffuse LM, with lower training cost and better performances, inspired by linguistic features in languages. Specifically, we design a linguistic-informed forward process which adds corruptions to the text through strategically soft-masking to better noise the textual data. Also, we directly predict the categorical distribution with cross-entropy loss function in every diffusion step to connect the continuous space and discrete space in a more efficient and straightforward way. Through experiments on 5 controlled generation tasks, we demonstrate that our Masked-Diffuse LM can achieve better generation quality than the state-of-the-art diffusion models with better efficiency.
翻译:基于迭代去噪的扩散模型最近被提出并在图像生成等多种生成任务中得到应用。然而,作为一种为连续数据天然构建的方法,现有扩散模型在建模离散数据(如语言)时仍存在一些局限性。例如,通常使用的高斯噪声无法很好地处理离散损坏,且连续空间中的目标在扩散过程中(尤其是高维情况下)难以稳定处理文本数据。为缓解这些问题,我们受语言中语言学特征的启发,引入了一种用于语言建模的新型扩散模型——Masked-Diffuse LM,该模型具有更低的训练成本和更好的性能。具体而言,我们设计了一个语言学感知的前向过程,通过策略性的软掩码向文本添加损坏,以更好地对文本数据进行加噪。同时,我们在每个扩散步骤中直接使用交叉熵损失函数预测类别分布,以更高效、更直接的方式连接连续空间和离散空间。通过在5个受控生成任务上的实验,我们证明Masked-Diffuse LM能够比现有最先进的扩散模型实现更好的生成质量,同时具有更高的效率。