Masked diffusion language models (MDLMs) have recently emerged as a new paradigm in language modeling, offering flexible generation dynamics and enabling efficient parallel decoding. However, existing decoding strategies for pre-trained MDLMs predominantly rely on token-level uncertainty criteria, while largely overlooking sequence-level information and inter-token dependencies. To address this limitation, we propose Dependency-Oriented Sampler (DOS), a training-free decoding strategy that leverages inter-token dependencies to inform token updates during generation. Specifically, DOS exploits attention matrices from transformer blocks to approximate inter-token dependencies, emphasizing information from unmasked tokens when updating masked positions. Empirical results demonstrate that DOS consistently achieves superior performance on both code generation and mathematical reasoning tasks. Moreover, DOS can be seamlessly integrated with existing parallel sampling methods, leading to improved generation efficiency without sacrificing generation quality.
翻译:掩码扩散语言模型(MDLMs)近期已成为语言建模领域的新范式,其具备灵活的生成动态特性并支持高效并行解码。然而,现有针对预训练MDLMs的解码策略主要依赖词元级不确定性准则,而普遍忽略了序列级信息及词元间依赖关系。为克服此局限,本文提出依赖导向采样器(DOS),这是一种无需额外训练的解码策略,通过利用词元间依赖关系来指导生成过程中的词元更新。具体而言,DOS利用Transformer模块中的注意力矩阵来近似词元间依赖关系,在更新掩码位置时强化未掩码词元的信息传递。实验结果表明,DOS在代码生成与数学推理任务上均能持续取得更优性能。此外,DOS可与现有并行采样方法无缝集成,在不牺牲生成质量的前提下进一步提升生成效率。