Diffusion Language Models (DLMs) offer order-agnostic generation that can explore many possible decoding trajectories. However, current decoding methods commit to a single trajectory, limiting exploration in trajectory space. We introduce Order-Token Search to explore this space through jointly searching over generation order and token values. Its core is a likelihood estimator that scores denoising actions, enabling stable pruning and efficient exploration of diverse trajectories. Across mathematical reasoning and coding benchmarks, Order-Token Search consistently outperforms baselines on GSM8K, MATH500, Countdown, and HumanEval (3.1%, 3.8%, 7.9%, and 6.8% absolute over backbone), matching or surpassing diffu-GRPO post-trained d1-LLaDA. Our work establishes joint search as a key component for advancing decoding in DLMs.
翻译:扩散语言模型(DLMs)提供顺序无关的生成能力,能够探索多种可能的解码轨迹。然而,当前的解码方法仅遵循单一轨迹,限制了轨迹空间的探索。我们提出顺序-词元搜索方法,通过联合搜索生成顺序与词元取值来探索该空间。其核心是一个对去噪操作进行评分的似然估计器,能够实现稳定剪枝并高效探索多样化轨迹。在数学推理与代码生成基准测试中,顺序-词元搜索在GSM8K、MATH500、Countdown和HumanEval数据集上均持续超越基线模型(较骨干模型分别绝对提升3.1%、3.8%、7.9%和6.8%),达到或超越了经过后训练的diffu-GRPO d1-LLaDA模型。本研究确立了联合搜索作为推进DLMs解码技术的关键组成部分。