$ρ$-$\texttt{EOS}$: Training-free Bidirectional Variable-Length Control for Masked Diffusion LLMs

Beyond parallel generation and global context modeling, current masked diffusion large language models (masked dLLMs, i.e., LLaDA) suffer from a fundamental limitation: they require a predefined, fixed generation length, which lacks flexibility and forces an inevitable trade-off between output quality and computational efficiency. To address this, we study the denoising dynamics and find that the implicit density ($ρ$) of end-of-sequence ($\texttt{EOS}$) tokens serves as a reliable signal of generation sufficiency. In particular, the evolving implicit $\texttt{EOS}$ density during denoising reveals whether the current masked space is excessive or insufficient, thereby guiding the adjustment direction for generation length. Building on this insight, we propose $\textbf{$ρ$-$\texttt{EOS}$}$, a training-free, single-stage strategy that enables bidirectional variable-length generation for masked dLLMs. Unlike prior two-stage approaches--which require separate length adjustment and iterative mask insertion phases while supporting only unidirectional expansion--$\textbf{$ρ$-$\texttt{EOS}$}$ achieves bidirectional length adjustment within a unified denoising process by continuously estimating the implicit $\texttt{EOS}$ density: excessively high density triggers $\texttt{MASK}$ token contraction, while insufficient density induces expansion. Extensive experiments on mathematics and code benchmarks demonstrate that $\textbf{$ρ$-$\texttt{EOS}$}$ achieves comparable performance while substantially improving inference efficiency and token utilization. Code is available at https://github.com/yjyddq/rho-EOS.

翻译：除了并行生成和全局上下文建模能力，当前的掩码扩散大语言模型（masked dLLMs，例如 LLaDA）存在一个根本性限制：它们需要预定义且固定的生成长度，这缺乏灵活性，并迫使输出质量与计算效率之间做出不可避免的权衡。为解决此问题，我们研究了去噪动力学，发现序列结束（$\texttt{EOS}$）令牌的隐式密度（$ρ$）可作为生成充分性的可靠信号。具体而言，去噪过程中演化的隐式 $\texttt{EOS}$ 密度揭示了当前掩码空间是过剩还是不足，从而指导生成长度的调整方向。基于这一洞见，我们提出了 $\textbf{$ρ$-$\texttt{EOS}$}$，一种无训练、单阶段的策略，使掩码 dLLMs 能够实现双向可变长度生成。与先前需要独立长度调整和迭代掩码插入阶段、且仅支持单向扩展的两阶段方法不同，$\textbf{$ρ$-$\texttt{EOS}$}$ 通过持续估计隐式 $\texttt{EOS}$ 密度，在统一的去噪过程中实现双向长度调整：密度过高触发 $\texttt{MASK}$ 令牌收缩，而密度不足则引发扩展。在数学和代码基准上的大量实验表明，$\textbf{$ρ$-$\texttt{EOS}$}$ 在实现可比性能的同时，显著提高了推理效率和令牌利用率。代码发布于 https://github.com/yjyddq/rho-EOS。