Diffusion language models (Diffusion-LMs) introduce an explicit temporal dimension into text generation, yet how this structure can be leveraged to control generation diversity for exploring multiple valid semantic or reasoning paths remains underexplored. In this paper, we show that Diffusion-LMs, like diffusion models in image generation, exhibit a temporal division of labor: early denoising steps largely determine the global semantic structure, while later steps focus on local lexical refinement. Building on this insight, we propose Time-Annealed Perturbation Sampling (TAPS), a training-free inference strategy that encourages semantic branching early in the diffusion process while progressively reducing perturbations to preserve fluency and instruction adherence. TAPS is compatible with both non-autoregressive and semi-autoregressive Diffusion backbones, demonstrated on LLaDA and TraDo in our paper, and consistently improves output diversity across creative writing and reasoning benchmarks without compromising generation quality.
翻译:扩散语言模型(Diffusion-LMs)将显式的时间维度引入文本生成,然而如何利用这种结构来控制生成多样性,以探索多个有效的语义或推理路径,仍研究不足。本文表明,扩散语言模型与图像生成中的扩散模型类似,呈现出一种时间分工:早期去噪步骤主要决定全局语义结构,而后期步骤则专注于局部词汇精炼。基于这一洞见,我们提出了时间退火扰动采样(TAPS),这是一种无需训练的推理策略,旨在扩散过程早期鼓励语义分支,同时逐步减少扰动以保持流畅性和指令遵循性。TAPS与非自回归和半自回归扩散主干网络均兼容,本文在LLaDA和TraDo模型上进行了演示,并在创意写作和推理基准测试中持续提升了输出多样性,且未损害生成质量。