Diffusion large language models (DLLMs) have the potential to enable fast text generation by decoding multiple tokens in parallel. However, in practice, their inference efficiency is constrained by the need for many refinement steps, while aggressively reducing the number of steps leads to a substantial degradation in generation quality. To alleviate this, we propose a trajectory self-distillation framework that improves few-step decoding by distilling the model's own generative trajectories. We incorporate Direct Discriminative Optimization (DDO), a reverse-KL objective that promotes mode-seeking distillation and encourages the student to concentrate on high-probability teacher modes. Across benchmarks, our approach consistently outperforms strong few-step baselines and standard training under tight step budgets. Although full-step decoding remains superior, we substantially narrow the gap, establishing a strong foundation towards practical few-step DLLMs. The source code is available at https://github.com/Tyrion58/T3D.
翻译:扩散大语言模型(DLLMs)具有通过并行解码多个令牌来实现快速文本生成的潜力。然而,在实践中,其推理效率受限于需要大量细化步骤,而激进地减少步数会导致生成质量显著下降。为缓解此问题,我们提出了一种轨迹自蒸馏框架,通过蒸馏模型自身的生成轨迹来改进少步解码。我们引入了直接判别优化(DDO),这是一种促进模式寻求蒸馏的反向KL目标,鼓励学生模型聚焦于教师模型的高概率模式。在多个基准测试中,我们的方法在严格的步数预算下,始终优于强大的少步基线及标准训练。尽管全步解码仍具优势,但我们显著缩小了性能差距,为实用的少步DLLMs奠定了坚实基础。源代码可在 https://github.com/Tyrion58/T3D 获取。