Diffusion Language Models (DLMs) offer order-agnostic generation that can explore many possible decoding trajectories. However, current decoding methods commit to a single trajectory, limiting exploration in trajectory space. We introduce Order-Token Search to explore this space through jointly searching over generation order and token values. Its core is a likelihood estimator that scores denoising actions, enabling stable pruning and efficient exploration of diverse trajectories. Across mathematical reasoning and coding benchmarks, Order-Token Search consistently outperforms baselines on GSM8K, MATH500, Countdown, and HumanEval (3.1%, 3.8%, 7.9%, and 6.8% absolute over backbone), matching or surpassing diffu-GRPO post-trained d1-LLaDA. Our work establishes joint search as a key component for advancing decoding in DLMs.
翻译:扩散语言模型(DLMs)提供了顺序无关的生成方式,能够探索多种可能的解码轨迹。然而,当前的解码方法仅遵循单一轨迹,限制了在轨迹空间中的探索。我们引入了顺序-词元搜索方法,通过联合搜索生成顺序与词元取值来探索这一空间。其核心是一个似然估计器,用于评估去噪操作,从而实现对多样化轨迹的稳定剪枝与高效探索。在数学推理与代码生成基准测试(GSM8K、MATH500、Countdown 和 HumanEval)中,顺序-词元搜索均优于基线方法(相对骨干模型绝对提升分别为 3.1%、3.8%、7.9% 和 6.8%),达到或超越了经过后训练的 diffu-GRPO d1-LLaDA 模型。我们的工作确立了联合搜索作为推进 DLMs 解码能力的关键组成部分。