Diffusion language models (DLMs) have recently emerged as a strong alternative to autoregressive models by enabling parallel text generation. To improve inference efficiency and KV-cache compatibility, prior work commonly adopts block-based diffusion, decoding tokens block by block. However, this paradigm suffers from a structural limitation that we term Boundary-Induced Context Truncation (BICT): undecoded tokens near block boundaries are forced to commit without access to nearby future context, even when such context could substantially reduce uncertainty. This limitation degrades decoding confidence and generation quality, especially for tasks requiring precise reasoning, such as mathematical problem solving and code generation. We propose Deferred Commitment Decoding (DCD), a novel, training-free decoding strategy that mitigates this issue. DCD maintains a confidence-aware sliding window over masked tokens, resolving low-uncertainty tokens early while deferring high-uncertainty tokens until sufficient contextual evidence becomes available. This design enables effective bidirectional information flow within the decoding window without sacrificing efficiency. Extensive experiments across multiple diffusion language models, benchmarks, and caching configurations show that DCD improves generation accuracy by 1.39% with comparable time on average compared to fixed block-based diffusion methods, with the most significant improvement reaching 9.0%. These results demonstrate that deferring token commitment based on uncertainty is a simple yet effective principle for improving both the quality and efficiency of diffusion language model decoding.
翻译:扩散语言模型(DLMs)作为自回归模型的强有力替代方案,通过支持并行文本生成而崭露头角。为提高推理效率与KV缓存兼容性,现有研究普遍采用基于块的扩散策略,逐块解码词元。然而,该范式存在一种我们称之为“边界诱导上下文截断”(BICT)的结构性局限:位于块边界附近的未解码词元被迫在缺乏邻近未来上下文的情况下进行承诺,即使此类上下文能显著降低不确定性。这一限制会削弱解码置信度与生成质量,在需要精确推理的任务(如数学问题求解与代码生成)中尤为明显。我们提出延迟承诺解码(DCD),一种无需训练的新型解码策略以缓解此问题。DCD在掩码词元上维护一个置信度感知的滑动窗口,尽早解析低不确定性词元,同时将高不确定性词元推迟至获得足够上下文证据后再处理。该设计能在不牺牲效率的前提下,在解码窗口内实现有效的双向信息流动。通过在多种扩散语言模型、基准测试及缓存配置上进行大量实验,结果表明:相较于固定块扩散方法,DCD在保持相当时间开销的同时,平均提升生成准确率1.39%,最大提升幅度达9.0%。这些结果证明,基于不确定性延迟词元承诺是一种简单而有效的原则,可同时提升扩散语言模型解码的质量与效率。