Diffusion Large Language Models (dLLMs) support arbitrary-order generation, yet their inference performance critically depends on the unmasking order. Existing strategies rely on heuristics that greedily optimize local confidence, offering limited guidance for identifying unmasking paths that are globally consistent and accurate. To bridge this gap, we introduce path log-likelihood (Path LL), a trajectory-conditioned objective that strongly correlates with downstream accuracy and enables principled selection of unmasking paths. To optimize Path LL at inference time, we propose POKE, an efficient value estimator that predicts the expected future Path LL of a partial decoding trajectory. We then integrate this lookahead signal into POKE-SMC, a Sequential Monte Carlo-based search framework for dynamically identifying optimal unmasking paths. Extensive experiments across 6 reasoning tasks show that POKE-SMC consistently improves accuracy, achieving 2%--3% average gains over strong decoding-time scaling baselines at comparable inference overhead on LLaDA models and advancing the accuracy--compute Pareto frontier.
翻译:扩散大语言模型(dLLMs)支持任意顺序生成,但其推理性能关键取决于解掩码顺序。现有策略依赖于贪婪优化局部置信度的启发式方法,对于识别全局一致且准确的解掩码路径提供的指导有限。为弥补这一差距,我们引入了路径对数似然(Path LL),这是一种轨迹条件目标,与下游准确率强相关,并能支持基于原则的解掩码路径选择。为在推理时优化Path LL,我们提出了POKE——一种高效的价值估计器,用于预测部分解码轨迹的预期未来Path LL。随后,我们将此前瞻信号集成到POKE-SMC中,这是一个基于序列蒙特卡洛的搜索框架,用于动态识别最优解掩码路径。在6项推理任务上的大量实验表明,POKE-SMC能持续提升准确率,在LLaDA模型上以可比的推理开销相比强大的解码时缩放基线平均获得2%–3%的增益,并推进了准确率-计算量的帕累托前沿。