Despite the non-autoregressive potential of diffusion language models (dLLMs), existing decoding strategies demonstrate positional bias, failing to fully unlock the potential of arbitrary generation. In this work, we delve into the inherent spectral characteristics of dLLMs and present the first frequency-domain analysis showing that low-frequency components in hidden states primarily encode global structural information and long-range dependencies, while high-frequency components are responsible for characterizing local details. Based on this observation, we propose FourierSampler, which leverages a frequency-domain sliding window mechanism to dynamically guide the model to achieve a "structure-to-detail" generation. FourierSampler outperforms other inference enhancement strategies on LLADA and SDAR, achieving relative improvements of 20.4% on LLaDA1.5-8B and 16.0% on LLaDA-8B-Instruct. It notably surpasses similarly sized autoregressive models like Llama3.1-8B-Instruct.
翻译:尽管扩散语言模型(dLLMs)具有非自回归的潜力,但现有的解码策略表现出位置偏差,未能完全释放其任意生成的潜力。在本工作中,我们深入探究了dLLMs固有的频谱特性,并首次提出了频域分析,表明隐藏状态中的低频分量主要编码全局结构信息和长程依赖关系,而高频分量则负责刻画局部细节。基于这一观察,我们提出了傅里叶采样器,它利用频域滑动窗口机制动态引导模型,实现“从结构到细节”的生成。在LLADA和SDAR基准测试中,傅里叶采样器优于其他推理增强策略,在LLaDA1.5-8B上实现了20.4%的相对提升,在LLaDA-8B-Instruct上实现了16.0%的相对提升。其表现显著超越了类似规模的自回归模型,如Llama3.1-8B-Instruct。