Diffusion Language Models (DLMs) generate text by iteratively denoising a masked sequence, repeatedly deciding which positions to commit at each step. Standard decoding follows a greedy rule: unmask the most confident positions, yet this local choice can lock the model into a suboptimal unmasking order, especially on reasoning-heavy prompts. We present SOAR, a training-free decoding algorithm that adapts its behavior to the model's uncertainty. When confidence is low, SOAR briefly widens the search over alternative unmasking decisions to avoid premature commitments; when confidence is high, it collapses the search and decodes many positions in parallel to reduce the number of denoising iterations. Across mathematical reasoning and code generation benchmarks (GSM8K, MBPP, HumanEval) on Dream-7B and LLaDA-8B, SOAR improves generation quality while maintaining competitive inference speed, offering a practical way to balance quality and efficiency in DLM decoding.
翻译:扩散语言模型通过迭代去噪掩码序列生成文本,在每一步反复决定哪些位置需要确定。标准解码遵循贪心规则:即每次去噪置信度最高的位置,然而这种局部选择可能将模型锁定在次优的去噪顺序中,尤其是在推理密集的提示上。我们提出SOAR,一种无需训练的解码算法,其行为会根据模型的不确定性进行自适应调整。当置信度较低时,SOAR会短暂扩大对替代去噪决策的搜索范围,以避免过早确定;当置信度较高时,它会收缩搜索并并行解码多个位置,以减少去噪迭代次数。在Dream-7B和LLaDA-8B模型上,针对数学推理和代码生成基准(GSM8K、MBPP、HumanEval)的实验表明,SOAR在保持具有竞争力的推理速度的同时,提升了生成质量,为DLM解码提供了一种平衡质量与效率的实用方法。