Diffusion large language models (dLLMs) offer a promising alternative to autoregressive decoding by iteratively refining masked sequences, enabling parallel token updates and bidirectional conditioning. Their practical efficiency, however, is limited by sampling procedures that execute a fixed number of reverse denoising steps selected before decoding, spending computation on already-stable positions and sometimes committing unstable ones too early. We present \textsc{LESS}, a training-free, model-agnostic adaptive sampler that treats token commitment as an online stopping problem. \textsc{LESS} implements mutual-stability sampling through a joint stability rule that makes a masked position eligible for unmasking only when its top-1 prediction has high confidence, its top-1 token persists across recent reverse steps, and its predictive distribution is stable under top-$K$ inter-step Jensen--Shannon divergence. We evaluate \textsc{LESS} on Dream-7B, LLaDA-8B, and LLaDA-1.5-8B, covering full-sequence diffusion and semi-autoregressive blockwise sampling regimes, across seven benchmarks spanning general knowledge, math, and code. \textsc{LESS} improves average accuracy over strong training-free adaptive samplers while using $72.1\%$ fewer reverse steps than fixed-budget decoding. Since each reverse step requires a Transformer forward pass, these step-count reductions translate into fewer forward evaluations, lower measured wall-clock latency, and lower estimated inference compute.
翻译:扩散大语言模型通过迭代精炼掩码序列,能够实现并行令牌更新和双向条件建模,为自回归解码提供了具有前景的替代方案。然而,其实用效率受限于采用固定反向去噪步数的采样流程——这些步数在解码前即已选定,导致计算资源浪费在已稳定的位置上,有时还会过早提交不稳定的位置。本文提出 \textsc{LESS}——一种无需训练、模型无关的自适应采样器,将令牌提交问题视为在线停止问题。\textsc{LESS} 通过联合稳定性规则实现互稳定采样:只有当掩码位置的置信度排名第一的预测具有高置信度、该排名第一的令牌在最近几次反向步骤中持续出现、且其预测分布在基于 top-$K$ 的步骤间詹森-香农散度下保持稳定时,该位置才具备解除掩码的资格。我们在 Dream-7B、LLaDA-8B 和 LLaDA-1.5-8B 模型上评估 \textsc{LESS},涵盖全序列扩散与半自回归逐块采样两种模式,涉及通用知识、数学与代码等七个基准测试。相比强基线训练无关自适应采样器,\textsc{LESS} 在提升平均准确率的同时,比固定预算解码减少了 $72.1\%$ 的反向步骤。由于每个反向步骤对应一次 Transformer 前向传播,步数缩减转化为更少的前向评估次数、更低的实测端到端延迟以及更低的估计推理计算量。