Masked diffusion models (MDMs) have recently emerged as a promising paradigm for sequence generation. Scaling MDMs is conventionally achieved by increasing the parameter count or the number of denoising steps. We introduce Recursive Masked Diffusion Models (R-MDMs), which add recursive depth as a third scaling axis by repeatedly applying the same denoising transformer within each diffusion step. Recursion enables iterative refinement of the output through parameter reuse, increasing effective model depth without increasing parameter count. Across structured generation tasks, including Sudoku and Countdown, we show that R-MDMs achieve substantially improved parameter efficiency: a model with $L$ recursive iterations often matches the performance of non-recursive baselines with roughly $L\times$ more parameters. Moreover, recursive refinement can partially substitute for additional denoising steps, allowing recursive models to reach the same generation quality with fewer forward passes at inference time. These results suggest that recursive depth is a practically useful scaling mechanism for MDMs, improving both parameter efficiency and the allocation of test-time compute.
翻译:掩码扩散模型(MDMs)近期已成为序列生成领域的一种有前景范式。传统上,扩展MDMs主要通过增加参数数量或去噪步数实现。本文提出递归掩码扩散模型(R-MDMs),通过在每个扩散步骤中重复应用相同的去噪Transformer,将递归深度作为第三扩展轴。递归机制通过参数复用量化输出迭代精炼,在不增加参数量的情况下提升有效模型深度。在数独与Countdown等结构化生成任务中,我们证明R-MDMs能够显著提升参数效率:一个具有L次递归迭代的模型,其性能往往可与参数规模约为L倍的非递归基线模型相媲美。此外,递归精炼可部分替代额外的去噪步骤,使递归模型在推理时以更少的前向传播次数达到相同生成质量。这些结果表明,递归深度对MDMs而言是一种具有实际应用价值的扩展机制,既能提升参数效率,又能优化测试时计算资源的分配。