Masked Diffusion Models (MDMs) offer greater flexibility in decoding order than autoregressive models but require careful planning to achieve high-quality generation. Existing samplers typically adopt greedy heuristics, prioritizing positions with the highest local certainty to decode at each step. Through failure case analysis, we identify a fundamental limitation of this approach: it neglects the downstream impact of current decoding choices on subsequent steps and fails to minimize cumulative uncertainty. In particular, these methods do not fully exploit the non-causal nature of MDMs, which enables evaluating how a decoding decision reshapes token probabilities/uncertainty across all remaining masked positions. To bridge this gap, we propose the Info-Gain Sampler, a principled decoding framework that balances immediate uncertainty with information gain over future masked tokens. Extensive evaluations across diverse architectures and tasks (reasoning, coding, creative writing, and image generation) demonstrate that Info-Gain Sampler consistently outperforms existing samplers for MDMs. For instance, it achieves a 3.6% improvement in average accuracy on reasoning tasks and a 63.1% win-rate in creative writing. Notably, on reasoning tasks it reduces cumulative uncertainty from 78.4 to 48.6, outperforming the best baseline by a large margin. The code will be available at https://github.com/yks23/Information-Gain-Sampler.
翻译:掩码扩散模型(MDMs)相比自回归模型在解码顺序上具有更高的灵活性,但需要精心规划以实现高质量生成。现有采样器通常采用贪心启发式策略,每一步优先解码局部确定性最高的位置。通过失败案例分析,我们发现该方法存在根本性局限:它忽略了当前解码选择对后续步骤的下游影响,且未能最小化累积不确定性。具体而言,这些方法未能充分利用MDMs的非因果特性——该特性使得我们可以评估解码决策如何重塑所有剩余掩码位置的标记概率/不确定性。为弥补这一缺陷,我们提出信息增益采样器,这是一种平衡即时不确定性与未来掩码标记信息增益的原则性解码框架。通过对多样化架构和任务(推理、代码生成、创意写作和图像生成)的广泛评估,证明信息增益采样器在MDMs中始终优于现有采样器。例如,在推理任务上平均准确率提升3.6%,在创意写作任务中胜率达到63.1%。值得注意的是,在推理任务上它将累积不确定性从78.6降至48.6,显著超越最佳基线方法。代码将在https://github.com/yks23/Information-Gain-Sampler发布。