Diffusion Large Language Models (dLLMs) represent a new paradigm beyond autoregressive modeling, offering competitive performance while naturally enabling a flexible decoding process. Specifically, dLLMs can generate tokens at arbitrary positions in parallel, endowing them with significant potential for parallel test-time scaling, which was previously constrained by severe inefficiency in autoregressive modeling. In this work, we introduce dVoting, a fast voting technique that boosts reasoning capability without training, with only an acceptable extra computational overhead. dVoting is motivated by the observation that, across multiple samples for the same prompt, token predictions remain largely consistent, whereas performance is determined by a small subset of tokens exhibiting cross-sample variability. Leveraging the arbitrary-position generation capability of dLLMs, dVoting performs iterative refinement by sampling, identifying uncertain tokens via consistency analysis, regenerating them through voting, and repeating this process until convergence. Extensive evaluations demonstrate that dVoting consistently improves performance across various benchmarks. It achieves gains of 6.22%-7.66% on GSM8K, 4.40%-7.20% on MATH500, 3.16%-14.84% on ARC-C, and 4.83%-5.74% on MMLU. Our code is available at https://github.com/fscdc/dVoting
翻译:扩散大语言模型(dLLM)代表了一种超越自回归建模的新范式,在提供有竞争力性能的同时,天然支持灵活的解码过程。具体而言,dLLM 能够并行生成任意位置的词元,这赋予了它们在并行测试时扩展方面的巨大潜力,而这一潜力此前因自回归建模的严重低效性而受到限制。本文提出 dVoting,一种无需训练即可提升推理能力、且仅带来可接受额外计算开销的快速投票技术。dVoting 的动机源于以下观察:对于同一提示的多个采样,词元预测在整体上保持高度一致,而模型性能实际上由一小部分存在跨样本差异的词元决定。利用 dLLM 的任意位置生成能力,dVoting 通过采样进行迭代优化:首先识别出通过一致性分析判定的不确定词元,接着通过投票机制重新生成这些词元,并重复此过程直至收敛。大量实验评估表明,dVoting 在多种基准测试中均能持续提升性能。该方法在 GSM8K 上获得了 6.22%–7.66% 的提升,在 MATH500 上提升 4.40%–7.20%,在 ARC-C 上提升 3.16%–14.84%,在 MMLU 上提升 4.83%–5.74%。我们的代码公开于 https://github.com/fscdc/dVoting。