Block-wise decoding effectively improves the inference speed and quality in diffusion language models (DLMs) by combining inter-block sequential denoising and intra-block parallel unmasking. However, existing block-wise decoding methods typically partition blocks in a rigid and fixed manner, which inevitably fragments complete semantic or syntactic constituents, leading to suboptimal performance. Inspired by the entropy reduction hypothesis (ERH), we recognize that constituent boundaries offer greater opportunities for uncertainty reduction, which motivates us to employ entropy analysis for identifying constituent boundaries. Therefore, we propose Swordsman, an entropy-driven adaptive block-wise decoding framework for DLMs. Swordsman adaptively partitions blocks by identifying entropy shifts between adjacent tokens to better align with semantic or syntactic constituent boundaries. In addition, Swordsman dynamically adjusts unmasking thresholds conditioned on the real-time unmasking status within a block, further improving both efficiency and stability. As a training-free framework, supported by KV Cache, Swordsman demonstrates state-of-the-art performance across extensive evaluations.
翻译:块级解码通过结合块间顺序去噪与块内并行解掩码,有效提升了扩散语言模型(DLMs)的推理速度与生成质量。然而,现有的块级解码方法通常采用固定且僵化的块划分方式,这不可避免地会割裂完整的语义或句法成分,导致性能欠佳。受熵减假说(ERH)的启发,我们认识到成分边界为不确定性降低提供了更大机会,这促使我们利用熵分析来识别成分边界。为此,我们提出了剑客(Swordsman),一种用于DLMs的熵驱动自适应块级解码框架。Swordsman通过识别相邻词元间的熵移来自适应地划分块,从而更好地与语义或句法成分边界对齐。此外,Swordsman根据块内实时的解掩码状态动态调整解掩码阈值,进一步提升了效率与稳定性。作为一个无需训练的框架,在KV Cache的支持下,Swordsman在广泛的评估中展现了最先进的性能。