Transfer Learning for Text Diffusion Models

In this report, we explore the potential for text diffusion to replace autoregressive (AR) decoding for the training and deployment of large language models (LLMs). We are particularly interested to see whether pretrained AR models can be transformed into text diffusion models through a lightweight adaptation procedure we call ``AR2Diff''. We begin by establishing a strong baseline setup for training text diffusion models. Comparing across multiple architectures and pretraining objectives, we find that training a decoder-only model with a prefix LM objective is best or near-best across several tasks. Building on this finding, we test various transfer learning setups for text diffusion models. On machine translation, we find that text diffusion underperforms the standard AR approach. However, on code synthesis and extractive QA, we find diffusion models trained from scratch outperform AR models in many cases. We also observe quality gains from AR2Diff -- adapting AR models to use diffusion decoding. These results are promising given that text diffusion is relatively underexplored and can be significantly faster than AR decoding for long text generation.

翻译：在本报告中，我们探讨了文本扩散在大型语言模型（LLM）训练与部署中替代自回归（AR）解码的潜力。我们特别关注能否通过名为“AR2Diff”的轻量级适配流程将预训练的AR模型转化为文本扩散模型。首先，我们建立了训练文本扩散模型的强基线设置。通过对比多种架构与预训练目标，我们发现采用前缀语言模型（prefix LM）目标的仅解码器架构在多项任务中表现最佳或接近最佳。基于这一发现，我们测试了多种文本扩散模型的迁移学习设置。在机器翻译任务中，文本扩散的性能低于标准AR方法。但在代码合成与抽取式问答任务中，从头训练的扩散模型在多数情况下优于AR模型。我们还观察到AR2Diff带来的质量提升——即适配AR模型以使用扩散解码。鉴于文本扩散领域研究相对不足，且其在长文本生成中速度显著优于AR解码，这些结果令人鼓舞。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/