Masked Diffusion Language Models (MDLMs) have emerged as a distinct paradigm for sequence generation. As MDLMs become diverse in capabilities and knowledge coverage, an important question is how to combine their knowledge. Toward this, we first investigate the unique decoding dynamics of MDLMs. We find that successful generations exhibit stable confidence dynamics over answer-relevant positions, while unreliable trajectories can often be corrected by injecting promising intermediate states from other models. Guided by this observation, we propose $\textbf{TIE}$ ($\textbf{T}$rajectory-based $\textbf{I}$terative $\textbf{E}$nsembling), a knowledge fusion framework in which MDLMs iteratively identify reliable decoding trajectories and relay them across models. TIE tracks confidence dynamics over answer-relevant positions to determine which model currently follows a more reliable trajectory and selectively transfers partially denoised sequences across models. As the model on the more promising trajectory often changes across denoising steps, TIE allows different models to contribute complementary strengths at different stages of generation. Strong performance across diverse reasoning tasks, along with our analyses, suggests that TIE offers a practical approach to the underexplored problem of MDLM ensembling.
翻译:掩码扩散语言模型(MDLMs)已成为序列生成的一种独特范式。随着MDLMs在能力和知识覆盖范围上的多样化,如何整合它们的知识成为一个重要问题。为此,我们首先研究了MDLMs独特的解码动态。我们发现,成功的生成过程在答案相关位置上表现出稳定的置信度动态,而不可靠的轨迹通常可以通过注入来自其他模型的有希望中间状态来纠正。基于这一观察,我们提出了$\textbf{TIE}$(基于轨迹的$\textbf{迭代集成}$),这是一种知识融合框架,其中MDLMs迭代地识别可靠的解码轨迹并在模型之间传递它们。TIE跟踪答案相关位置上的置信度动态,以判断当前哪个模型遵循更可靠的轨迹,并选择性地跨模型转移部分去噪的序列。由于处于更优轨迹上的模型通常会在去噪步骤中发生变化,TIE允许不同模型在生成的不同阶段贡献互补的优势。在多种推理任务上的强劲性能以及我们的分析表明,TIE为MDLM集成这一尚未充分探索的问题提供了一种实用方法。