Discrete diffusion language models enable parallel token generation, offering a pathway to low-latency decoding. However, selecting tokens independently by marginal confidence limits effective parallelism: tokens that appear reliable in isolation can form incompatible configurations when several positions are updated at once. We introduce a training-free decoding framework that coordinates these parallel updates. At each forward pass, the method assigns a commit score to each masked position and refines these scores using pairwise interactions derived from the model's predictive distributions. A variational relaxation yields a simple fixed-point update that suppresses conflicting simultaneous commitments within a single forward pass. This mechanism allows the decoder to commit more tokens in parallel while maintaining competitive generation quality. The method is lightweight, requires no auxiliary model or retraining, and drops into existing diffusion decoding pipelines without modification. Experiments on reasoning and code-generation benchmarks show consistent improvements in the quality-latency trade-off.
翻译:离散扩散语言模型支持并行令牌生成,为实现低延迟解码提供了一条途径。然而,基于边际置信度独立选择令牌限制了并行化的有效性:尽管孤立来看某些令牌表现可靠,但当多个位置同时更新时,它们可能形成不兼容的配置。我们提出了一种无需训练的解码框架,用于协调这些并行更新。在前向传播的每一步中,该方法为每个掩码位置分配一个提交分数,并利用贝叶斯模型预测分布导出的成对交互对这些分数进行优化。通过变分松弛方法,我们得到一个简单的定点更新公式,可在单次前向传播中抑制冲突性的同时提交。该机制使得解码器能够在维持竞争性生成质量的同时,并行提交更多令牌。该方法轻量化,无需辅助模型或重新训练,可直接嵌入现有扩散解码流水线而无需修改。在推理与代码生成基准测试上的实验表明,该方法在质量-延迟权衡方面具有一致性提升。