DPRM: A Plug-in Doob h transform-induced Token-Ordering Module for Diffusion Language Models

Diffusion language models generate without a fixed left-to-right order, leaving token ordering as a central algorithmic choice. Existing systems mainly use random masking or confidence-driven ordering, which respectively suffer from train--test mismatch and myopic exploration. We introduce DPRM (Doob -transform Process Reward Model), a plug-in token-ordering module that keeps the host architecture, denoising objective and supervision unchanged, and modifies only the ordering policy. DPRM starts from confidence-driven ordering and gradually shifts to process-reward-guided ordering through online estimates. We characterize the exact DPRM policy as a reward-tilted Gibbs reveal law, prove convergence of its stagewise Soft-BoN approximation, show that the online bucketized controller tracks the exact DPRM score at empirical-Bernstein rates, and establish a sample-complexity advantage under tractable optimization assumptions. Across nine hosts covering language reasoning, test-time scaling, protein, single-cell, molecular, DNA, text-to-image generation, and VQA, DPRM order variants improve several language, DNA, and multimodal settings while also identifying boundary cases where confidence-only ordering or task-specific utilities are preferable. Code is available at: https://github.com/DakeBU/DPRM-DLLM

翻译：扩散语言模型无需遵循固定的从左到右生成顺序，使得词序选择成为核心算法设计问题。现有系统主要采用随机掩码或置信度驱动排序策略，但前者存在训练与测试不匹配问题，后者则受限于短视探索。我们提出DPRM（Doob h变换过程奖励模型）——一种即插即用的词序排序模块，在保持主体架构、去噪目标及监督信号不变的前提下，仅修改排序策略。DPRM以置信度驱动排序为起点，通过在线估计逐步过渡到过程奖励引导的排序。我们将该精确策略表征为奖励偏置的吉布斯揭示律，证明了其分段式Soft-BoN近似的收敛性，验证了在线分桶控制器以经验-伯恩斯坦速率追踪精确DPRM分数的能力，并在可处理优化假设下建立了样本复杂度优势。在涵盖语言推理、测试时扩展、蛋白质、单细胞、分子、DNA、文本到图像生成及VQA九种主体架构的实验中，DPRM排序变体在多项语言、DNA及多模态任务上取得性能提升，同时识别出置信度驱动排序或任务特定效用更优的边界情况。代码开源地址：https://github.com/DakeBU/DPRM-DLLM

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

内省扩散语言模型

专知会员服务

13+阅读 · 4月14日

【NeurIPS2025】基于卷积解码与拒斥式微调的快速流畅扩散语言模型

专知会员服务

12+阅读 · 2025年9月21日

扩散语言模型综述

专知会员服务

19+阅读 · 2025年8月15日

大规模语言模型增强推荐系统：分类、趋势、应用与未来

专知会员服务

41+阅读 · 2024年12月22日