Autoregressive large language models achieve strong results on many benchmarks, but decoding remains fundamentally latency-limited by sequential dependence on previously generated tokens. Diffusion language models (DLMs) promise parallel generation but suffer from a fundamental static-to-dynamic misalignment: Training optimizes local transitions under fixed schedules, whereas efficient inference requires adaptive "long-jump" refinements through unseen states. Our goal is to enable highly parallel decoding for DLMs with low number of function evaluations while preserving generation quality. To achieve this, we propose CD4LM, a framework that decouples training from inference via Discrete-Space Consistency Distillation (DSCD) and Confidence-Adaptive Decoding (CAD). Unlike standard objectives, DSCD trains a student to be trajectory-invariant, mapping diverse noisy states directly to the clean distribution. This intrinsic robustness enables CAD to dynamically allocate compute resources based on token confidence, aggressively skipping steps without the quality collapse typical of heuristic acceleration. On GSM8K, CD4LM matches the LLaDA baseline with a 5.18x wall-clock speedup; across code and math benchmarks, it strictly dominates the accuracy-efficiency Pareto frontier, achieving a 3.62x mean speedup while improving average accuracy. Code is available at https://github.com/yihao-liang/CDLM
翻译:自回归大语言模型在众多基准测试中表现出色,但其解码过程本质上受限于对已生成词元的顺序依赖,导致延迟难以降低。扩散语言模型(DLMs)有望实现并行生成,但面临一个根本性的静态-动态失配问题:训练过程在固定调度下优化局部状态转移,而高效推理则需要通过未见状态进行自适应的“长跳”式精炼。我们的目标是在保持生成质量的同时,以较少的函数评估次数实现DLMs的高度并行解码。为此,我们提出CD4LM框架,该框架通过离散空间一致性蒸馏(DSCD)与置信度自适应解码(CAD)将训练与推理解耦。与标准目标不同,DSCD训练学生模型具备轨迹不变性,能够将多样化的噪声状态直接映射到干净分布。这种内在的鲁棒性使得CAD能够根据词元置信度动态分配计算资源,在避免启发式加速中常见的质量崩溃的前提下,大幅跳过生成步骤。在GSM8K任务上,CD4LM在实现5.18倍实际加速的同时,性能与LLaDA基线持平;在代码与数学基准测试中,该方法严格主导了精度-效率帕累托前沿,在平均精度提升的同时实现了3.62倍的平均加速。代码发布于 https://github.com/yihao-liang/CDLM