Autoregressive (AR) language models generate text one token at a time, which limits their inference speed. Diffusion-based language models offer a promising alternative, as they can decode multiple tokens in parallel. However, we identify a key bottleneck in current diffusion LMs: the long decoding-window problem, where tokens generated far from the input context often become irrelevant or repetitive. Previous solutions like semi-autoregressive address this issue by splitting windows into blocks (sacrificing bidirectionality), but we find that this also leads to time-interval expansion problem, sacrificing the speed. Therefore, semi-AR eliminates the main advantages of diffusion models. To overcome this, we propose Convolutional decoding (Conv), a normalization-based method that narrows the decoding window without hard segmentation, leading to better fluency and flexibility. Additionally, we introduce Rejecting Rule-based Fine-Tuning (R2FT), a post-hoc training scheme that better aligns tokens at positions far from context. Our methods achieve state-of-the-art results on open-ended generation benchmarks (e.g., AlpacaEval) among diffusion LM baselines, with significantly lower step size than previous works, demonstrating both speed and quality improvements.
翻译:自回归语言模型逐个生成文本标记,这限制了其推理速度。基于扩散的语言模型提供了一种有前景的替代方案,因为它们能够并行解码多个标记。然而,我们发现了当前扩散语言模型的一个关键瓶颈:长解码窗口问题,即远离输入上下文生成的标记常常变得不相关或重复。先前诸如半自回归的解决方案通过将窗口分割成块(牺牲了双向性)来解决此问题,但我们发现这也会导致时间间隔扩展问题,从而牺牲了速度。因此,半自回归方法消除了扩散模型的主要优势。为了克服这一问题,我们提出了卷积解码,这是一种基于归一化的方法,无需硬分割即可缩小解码窗口,从而获得更好的流畅性和灵活性。此外,我们引入了基于拒绝规则的微调,这是一种后训练方案,能更好地对齐远离上下文的标记位置。我们的方法在开放式生成基准测试(例如AlpacaEval)中,在扩散语言模型基线中取得了最先进的结果,且所需步长显著低于先前工作,证明了在速度和质量上的双重提升。