Diffusion models have achieved great success in modeling continuous data modalities such as images, audio, and video, but have seen limited use in discrete domains such as language. Recent attempts to adapt diffusion to language have presented diffusion as an alternative to existing pretrained language models. We view diffusion and existing language models as complementary. We demonstrate that encoder-decoder language models can be utilized to efficiently learn high-quality language autoencoders. We then demonstrate that continuous diffusion models can be learned in the latent space of the language autoencoder, enabling us to sample continuous latent representations that can be decoded into natural language with the pretrained decoder. We validate the effectiveness of our approach for unconditional, class-conditional, and sequence-to-sequence language generation. We demonstrate across multiple diverse data sets that our latent language diffusion models are significantly more effective than previous diffusion language models.
翻译:扩散模型在图像、音频和视频等连续数据模态建模中取得了巨大成功,但在语言等离散领域的应用仍十分有限。近期将扩散模型适配至语言的尝试将其视为现有预训练语言模型的替代方案。我们认为扩散模型与现有语言模型具有互补性。研究表明,编码器-解码器语言模型可高效学习高质量语言自编码器。进一步,我们证实连续扩散模型能在语言自编码器的潜在空间中学习,从而采样出可解码为自然语言的连续潜在表征(借助预训练解码器)。我们验证了该方法在无条件生成、类别条件生成以及序列到序列语言生成中的有效性。在多个多样化数据集上的实验表明,我们提出的潜在语言扩散模型显著优于先前基于扩散的语言模型。