Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models, enabling parallel token generation while achieving competitive performance. Despite these advantages, MDMs face a fundamental limitation: once tokens are unmasked, they remain fixed, leading to error accumulation and ultimately degrading sample quality. We address this by proposing a framework that trains a model to perform both unmasking and correction. By reusing outputs from the MDM denoising network as inputs for corrector training, we train a model to recover from potential mistakes. During generation we apply additional corrective refinement steps between unmasking ones in order to change decoded tokens and improve outputs. We name our training and sampling method Progressive Self-Correction (ProSeCo) for its unique ability to iteratively refine an entire sequence, including already generated tokens. We conduct extensive experimental validation across multiple conditional and unconditional tasks, demonstrating that \method~yields better quality-efficiency trade-offs (up to ~4x faster sampling) and enables inference-time compute scaling to further increase sample quality beyond standard MDMs (up to ~1.2x improvement on benchmarks).
翻译:遮蔽扩散模型(MDMs)已成为自回归模型的一种有前景的替代方案,能够实现并行令牌生成并保持竞争性能。尽管具有这些优势,MDMs仍面临一个根本性限制:一旦令牌被取消遮蔽,它们将保持固定不变,导致误差累积并最终降低样本质量。为此,我们提出了一种框架,训练模型同时执行取消遮蔽与纠正操作。通过复用MDM降噪网络的输出作为纠正器训练的输入,我们训练模型从潜在错误中恢复。在生成过程中,我们在取消遮蔽步骤之间添加额外的纠正性细化步骤,以修改已解码的令牌并改善输出。我们将这种训练和采样方法命名为渐进式自我纠正(ProSeCo),因其独特能力能够迭代优化整个序列(包括已生成的令牌)。我们在多个条件生成和无条件生成任务上进行了广泛的实验验证,表明该方法实现了更优的质量-效率权衡(采样速度提升高达约4倍),并支持推理时计算扩展,进一步将样本质量提升至超越标准MDMs的水平(基准测试改进幅度高达约1.2倍)。