Test-time scaling strategies have effectively leveraged inference-time compute to enhance the reasoning abilities of Autoregressive Large Language Models. In this work, we demonstrate that Masked Diffusion Language Models (MDLMs) are inherently amenable to advanced search strategies, owing to their iterative and non-autoregressive generation process. To leverage this, we propose UnMaskFork (UMF), a framework that formulates the unmasking trajectory as a search tree and employs Monte Carlo Tree Search to optimize the generation path. In contrast to standard scaling methods relying on stochastic sampling, UMF explores the search space through deterministic partial unmasking actions performed by multiple MDLMs. Our empirical evaluation demonstrates that UMF consistently outperforms existing test-time scaling baselines on complex coding benchmarks, while also exhibiting strong scalability on mathematical reasoning tasks.
翻译:测试时扩展策略已能有效利用推理时的计算资源,以增强自回归大语言模型的推理能力。本研究证明,掩码扩散语言模型因其迭代式和非自回归的生成过程,天生适用于高级搜索策略。为此,我们提出UnMaskFork框架,该框架将去掩码轨迹构建为搜索树,并采用蒙特卡洛树搜索来优化生成路径。与依赖随机采样的标准扩展方法不同,UMF通过多个MDLM执行确定性部分去掩码动作来探索搜索空间。实证评估表明,在复杂的代码生成基准测试中,UMF始终优于现有的测试时扩展基线方法,同时在数学推理任务上也展现出强大的可扩展性。