Diffusion large language models (dLLMs) have emerged as a promising alternative for text generation, distinguished by their native support for parallel decoding. In practice, block inference is crucial for avoiding order misalignment in global bidirectional decoding and improving output quality. However, the widely-used fixed, predefined block (naive) schedule is agnostic to semantic difficulty, making it a suboptimal strategy for both quality and efficiency: it can force premature commitments to uncertain positions while delaying easy positions near block boundaries. In this work, we analyze the limitations of naive block scheduling and disclose the importance of dynamically adapting the schedule to semantic difficulty for reliable and efficient inference. Motivated by this, we propose Dynamic Sliding Block (DSB), a training-free block scheduling method that uses a sliding block with a dynamic size to overcome the rigidity of the naive block. To further improve efficiency, we introduce DSB Cache, a training-free KV-cache mechanism tailored to DSB. Extensive experiments across multiple models and benchmarks demonstrate that DSB, together with DSB Cache, consistently improves both generation quality and inference efficiency for dLLMs. Code is released at https://github.com/lizhuo-luo/DSB.
翻译:扩散大语言模型(dLLMs)已成为文本生成领域一种有前景的替代方案,其突出特点是原生支持并行解码。在实践中,块推断对于避免全局双向解码中的顺序错位以及提升输出质量至关重要。然而,当前广泛使用的固定预定义块(朴素)调度方案无法感知语义难度,使其在质量和效率上均成为次优策略:它可能迫使模型过早地对不确定位置做出承诺,同时延迟处理块边界附近易于确定的位置。本文中,我们分析了朴素块调度的局限性,并揭示了根据语义难度动态调整调度方案对于实现可靠高效推断的重要性。基于此,我们提出动态滑动块(DSB),这是一种无需训练的动态块调度方法,通过采用动态大小的滑动块来克服朴素块的僵化性。为进一步提升效率,我们引入了DSB Cache——一种专为DSB设计的免训练KV缓存机制。在多个模型和基准测试上的广泛实验表明,DSB与DSB Cache相结合,能够持续提升dLLMs的生成质量和推断效率。代码发布于 https://github.com/lizhuo-luo/DSB。