Theoretical Benefit and Limitation of Diffusion Language Model

Diffusion language models have emerged as a promising approach for text generation. One would naturally expect this method to be an efficient replacement for autoregressive models since multiple tokens can be sampled in parallel during each diffusion step. However, its efficiency-accuracy trade-off is not yet well understood. In this paper, we present a rigorous theoretical analysis of a widely used type of diffusion language model, the Masked Diffusion Model (MDM), and find that its effectiveness heavily depends on the target evaluation metric. Under mild conditions, we prove that when using perplexity as the metric, MDMs can achieve near-optimal perplexity in sampling steps regardless of sequence length, demonstrating that efficiency can be achieved without sacrificing performance. However, when using the sequence error rate--which is important for understanding the "correctness" of a sequence, such as a reasoning chain--we show that the required sampling steps must scale linearly with sequence length to obtain "correct" sequences, thereby eliminating MDM's efficiency advantage over autoregressive models. Our analysis establishes the first theoretical foundation for understanding the benefits and limitations of MDMs. All theoretical findings are supported by empirical studies.

翻译：扩散语言模型已成为一种有前景的文本生成方法。由于在每个扩散步骤中可以并行采样多个词元，人们自然会期望这种方法能成为自回归模型的高效替代方案。然而，其效率与准确性之间的权衡关系尚未得到充分理解。本文对一类广泛使用的扩散语言模型——掩码扩散模型（MDM）进行了严格的理论分析，发现其有效性在很大程度上取决于目标评估指标。在温和条件下，我们证明当使用困惑度作为评估指标时，无论序列长度如何，MDM都能在采样步数内实现接近最优的困惑度，这表明效率可以在不牺牲性能的情况下实现。然而，当使用序列错误率（这对于理解序列的“正确性”至关重要，例如推理链）时，我们证明所需的采样步数必须与序列长度呈线性比例才能获得“正确”序列，从而消除了MDM相对于自回归模型的效率优势。我们的分析为理解MDM的优势与局限性建立了首个理论基础。所有理论发现均得到了实证研究的支持。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日