Non-autoregressive (NAR) generation reduces decoding latency by predicting many tokens in parallel, but iterative refinement often suffers from error accumulation and distribution shift under self-generated drafts. Masked diffusion language models (MDLMs) and their remasking samplers (e.g., ReMDM) can be viewed as modern NAR iterative refinement, where generation repeatedly revises a partially observed draft. In this work we show that \emph{training alone} can substantially improve the step-efficiency of MDLM/ReMDM sampling. We propose \textsc{DSL} (Discrete Stochastic Localization), which trains a single SNR-invariant denoiser across a continuum of corruption levels, bridging intermediate draft noise and mask-style endpoint corruption within one Diffusion Transformer. On OpenWebText, \textsc{DSL} fine-tuning yields large MAUVE gains at low step budgets, surpassing the MDLM+ReMDM baseline with \(\sim\)4$\times$ fewer denoiser evaluations, and matches autoregressive quality at high budgets. Analyses show improved self-correction and uncertainty calibration, making remasking markedly more compute-efficient.
翻译:非自回归(NAR)生成通过并行预测多个标记来降低解码延迟,但迭代优化常因自生成草稿下的错误累积与分布偏移而受限。掩码扩散语言模型(MDLM)及其重掩码采样器(如ReMDM)可视为现代NAR迭代优化方法,其生成过程反复修正部分可见的草稿。本研究表明,仅通过训练即可显著提升MDLM/ReMDM采样的步数效率。我们提出离散随机定位方法,该方法训练一个跨连续噪声水平的单一信噪比不变去噪器,在单个扩散Transformer中桥接中间草稿噪声与掩码式端点噪声。在OpenWebText数据集上,离散随机定位微调在低步数预算下实现了显著的MAUVE指标提升,以约4倍更少的去噪器评估次数超越MDLM+ReMDM基线,并在高步数预算下达到自回归模型的质量水平。分析表明该方法提升了自校正能力与不确定性校准效果,使重掩码过程的计算效率显著提高。