Drawing inspiration from the success of diffusion models in various domains, numerous research papers proposed methods for adapting them to text data. Despite these efforts, none of them has managed to achieve the quality of the large language models. In this paper, we conduct a comprehensive analysis of key components of the text diffusion models and introduce a novel approach named Text Encoding Diffusion Model (TEncDM). Instead of the commonly used token embedding space, we train our model in the space of the language model encodings. Additionally, we propose to use a Transformer-based decoder that utilizes contextual information for text reconstruction. We also analyse self-conditioning and find that it increases the magnitude of the model outputs, allowing the reduction of the number of denoising steps at the inference stage. Evaluation of TEncDM on two downstream text generation tasks, QQP and XSum, demonstrates its superiority over existing non-autoregressive models.
翻译:受扩散模型在各领域成功应用的启发,大量研究论文提出了将其适配至文本数据的方法。尽管已有诸多尝试,但尚未有方法能够达到大型语言模型的生成质量。本文对文本扩散模型的关键组件进行了全面分析,并提出了一种名为文本编码扩散模型(TEncDM)的新方法。不同于常用的词元嵌入空间,我们在语言模型编码空间中训练模型。此外,我们提出使用基于Transformer的解码器,该解码器利用上下文信息进行文本重建。我们还分析了自调节机制,发现其能增大模型输出的幅度,从而在推理阶段减少去噪步数。在两项目标文本生成任务(QQP与XSum)上对TEncDM的评估表明,该方法优于现有非自回归模型。