Beyond parallel generation and global context modeling, current masked diffusion large language models (masked dLLMs, i.e., LLaDA) suffer from a fundamental limitation: they require a predefined, fixed generation length, which lacks flexibility and forces an inevitable trade-off between output quality and computational efficiency. To address this, we study the denoising dynamics and find that the implicit density ($ρ$) of end-of-sequence ($\texttt{EOS}$) tokens serves as a reliable signal of generation sufficiency. In particular, the evolving implicit $\texttt{EOS}$ density during denoising reveals whether the current masked space is excessive or insufficient, thereby guiding the adjustment direction for generation length. Building on this insight, we propose $\textbf{$ρ$-$\texttt{EOS}$}$, a training-free, single-stage strategy that enables bidirectional variable-length generation for masked dLLMs. Unlike prior two-stage approaches--which require separate length adjustment and iterative mask insertion phases while supporting only unidirectional expansion--$\textbf{$ρ$-$\texttt{EOS}$}$ achieves bidirectional length adjustment within a unified denoising process by continuously estimating the implicit $\texttt{EOS}$ density: excessively high density triggers $\texttt{MASK}$ token contraction, while insufficient density induces expansion. Extensive experiments on mathematics and code benchmarks demonstrate that $\textbf{$ρ$-$\texttt{EOS}$}$ achieves comparable performance while substantially improving inference efficiency and token utilization. Code is available at https://github.com/yjyddq/rho-EOS.
翻译:除了并行生成和全局上下文建模能力,当前的掩码扩散大语言模型(masked dLLMs,例如 LLaDA)存在一个根本性限制:它们需要预定义且固定的生成长度,这缺乏灵活性,并迫使输出质量与计算效率之间做出不可避免的权衡。为解决此问题,我们研究了去噪动力学,发现序列结束($\texttt{EOS}$)令牌的隐式密度($ρ$)可作为生成充分性的可靠信号。具体而言,去噪过程中演化的隐式 $\texttt{EOS}$ 密度揭示了当前掩码空间是过剩还是不足,从而指导生成长度的调整方向。基于这一洞见,我们提出了 $\textbf{$ρ$-$\texttt{EOS}$}$,一种无训练、单阶段的策略,使掩码 dLLMs 能够实现双向可变长度生成。与先前需要独立长度调整和迭代掩码插入阶段、且仅支持单向扩展的两阶段方法不同,$\textbf{$ρ$-$\texttt{EOS}$}$ 通过持续估计隐式 $\texttt{EOS}$ 密度,在统一的去噪过程中实现双向长度调整:密度过高触发 $\texttt{MASK}$ 令牌收缩,而密度不足则引发扩展。在数学和代码基准上的大量实验表明,$\textbf{$ρ$-$\texttt{EOS}$}$ 在实现可比性能的同时,显著提高了推理效率和令牌利用率。代码发布于 https://github.com/yjyddq/rho-EOS。