While Diffusion Language Models (DLMs) are theoretically well-suited for iterative refinement due to their non-causal structure, they often fail to reliably revise incorrect tokens in practice. The key challenge lies in the model's inability to distinguish between correct and erroneous tokens in a visible sequence. Standard masked diffusion language model (MDLM) training is restricted to the objective of unmasking, undermining the effectiveness of refinement guided by confidence. Based on this observation, we study corrective behavior in DLMs, defined as the ability to assign lower confidence to incorrect tokens and iteratively refine them while preserving correct content. We show that this capability is not induced by conventional masked diffusion objectives and propose a post-training principle oriented by correction that explicitly supervises visible incorrect tokens, enabling discriminative confidence and targeted refinement. To evaluate corrective behavior, we introduce the Code Revision Benchmark, a controllable and executable benchmark for assessing error localization and in-place correction. Experiments on code revision tasks and parallel decoding scenarios demonstrate that models trained with our approach substantially outperform standard MDLMs, with gains that are most pronounced when parallel decoding introduces substantial uncertainty and iterative refinement becomes essential. Our code is publicly available at https://github.com/zhangshuibai/CDLM.
翻译:尽管扩散语言模型(DLMs)因其非因果结构在理论上非常适合迭代优化,但在实践中往往无法可靠地修正错误标记。核心挑战在于模型难以区分可见序列中的正确与错误标记。标准的掩码扩散语言模型(MDLM)训练仅限于去掩码目标,这削弱了基于置信度引导的优化效果。基于这一观察,我们研究了DLMs中的修正行为,即模型为错误标记分配较低置信度并迭代优化它们,同时保留正确内容的能力。我们证明传统掩码扩散目标无法诱导此能力,并提出一种以修正为导向的后训练原则,该原则显式监督可见的错误标记,从而实现判别性置信度与针对性优化。为评估修正行为,我们引入了代码修订基准——一个可控制且可执行的基准,用于评估错误定位与原位修正能力。在代码修订任务与并行解码场景上的实验表明,采用我们方法训练的模型显著优于标准MDLMs,当并行解码引入显著不确定性且迭代优化变得至关重要时,性能提升尤为明显。我们的代码已公开于https://github.com/zhangshuibai/CDLM。