MDLMs generate text by denoising a preallocated masked response canvas, making response-length modeling central to instruction tuning. Existing MDLMs often inherit the autoregressive convention of using repeated \texttt{[EOS]} tokens for padding during instruction tuning, giving \texttt{[EOS]} a dual role as both a semantic terminator and a padding token. We show that this dual role is a root cause of \texttt{[EOS]} overflow under large-block decoding. To decouple these roles, we propose VoidPadding, which introduces \texttt{[VOID]} for padding and reserves \texttt{[EOS]} for termination. During inference, the learned \texttt{[EOS]} signal enables early stopping, while the learned \texttt{[VOID]} signal guides adaptive response canvas expansion. On Dream-7B-Instruct, VoidPadding improves the block-size-averaged four-task mean across mathematical reasoning and code generation benchmarks by \(+17.84\) points over the original model and \(+6.95\) points over RainbowPadding, while reducing decoding NFE by 55.7\% on average. Code is available at https://github.com/Haru-LCY/VoidPadding.
翻译:MDLMs通过在预分配的掩码响应画布上去噪来生成文本,这使得响应长度建模成为指令微调的核心。现有MDLMs常继承自回归范式中使用重复\texttt{[EOS]}令牌进行填充的做法,赋予\texttt{[EOS]}语义终止符和填充令牌的双重角色。我们证明这种双重角色是大块解码下\texttt{[EOS]}溢出的根本原因。为解耦这些角色,我们提出VoidPadding,引入\texttt{[VOID]}处理填充,保留\texttt{[EOS]}用于终止。推理时,学习到的\texttt{[EOS]}信号支持早停,而学习到的\texttt{[VOID]}信号引导自适应响应画布扩展。在Dream-7B-Instruct上,VoidPadding将块大小平均后的数学推理和代码生成四项任务均值相较于原始模型提升+17.84分,相较于RainbowPadding提升+6.95分,同时平均减少55.7%的解码NFE。代码见https://github.com/Haru-LCY/VoidPadding。