Masked Diffusion Models (MDMs) offer greater flexibility in decoding order than autoregressive models but require careful planning to achieve high-quality generation. Existing samplers typically adopt greedy heuristics, prioritizing positions with the highest local certainty to decode at each step. Through failure case analysis, we identify a fundamental limitation of this approach: it neglects the downstream impact of current decoding choices on subsequent steps and fails to minimize cumulative uncertainty. In particular, these methods do not fully exploit the non-causal nature of MDMs, which enables evaluating how a decoding decision reshapes token probabilities/uncertainty across all remaining masked positions. To bridge this gap, we propose the Info-Gain Sampler, a principled decoding framework that balances immediate uncertainty with information gain over future masked tokens. Extensive evaluations across diverse architectures and tasks (reasoning, coding, creative writing, and image generation) demonstrate that Info-Gain Sampler consistently outperforms existing samplers for MDMs. For instance, it achieves a 3.6% improvement in average accuracy on reasoning tasks and a 63.1% win-rate in creative writing. Notably, on reasoning tasks it reduces cumulative uncertainty from 78.4 to 48.6, outperforming the best baseline by a large margin. The code will be available at https://github.com/yks23/Information-Gain-Sampler.
翻译:掩码扩散模型(MDMs)相比自回归模型在解码顺序上具有更高的灵活性,但需要精心规划以实现高质量生成。现有采样器通常采用贪心启发式策略,在每一步优先解码局部确定性最高的位置。通过失败案例分析,我们发现该方法存在根本性局限:忽略了当前解码选择对后续步骤的下游影响,且未能最小化累积不确定性。具体而言,这些方法未能充分利用MDMs的非因果特性——该特性使得评估解码决策如何重塑所有剩余掩码位置的标记概率/不确定性成为可能。为弥补这一缺陷,我们提出信息增益采样器,这是一种平衡即时不确定性与未来掩码标记信息增益的原则性解码框架。在多样化架构与任务(推理、代码生成、创意写作及图像生成)上的广泛评估表明,信息增益采样器在MDMs中持续优于现有采样器。例如,其在推理任务上的平均准确率提升3.6%,在创意写作任务中获胜率达63.1%。值得注意的是,在推理任务中它将累积不确定性从78.6降至48.6,显著超越最佳基线方法。代码将在https://github.com/yks23/Information-Gain-Sampler公开。