Teaching Diffusion to Speculate Left-to-Right

Large language models (LLMs) achieve remarkable performance across a wide range of tasks, but their autoregressive decoding process incurs substantial inference costs due to inherently sequential token generation. Speculative decoding addresses this bottleneck by employing a lightweight draft model to propose multiple future tokens that are subsequently verified in parallel by a larger target model. Recent work has demonstrated that diffusion language models are well suited for this setting, as they can generate entire blocks of draft tokens in parallel and thereby alleviate the sequential constraints of autoregressive drafting. A subtlety of this regime is that block-diffusion drafters generate tokens bidirectionally within a block, whereas verification is performed by an autoregressive target model that evaluates tokens in a strictly left-to-right manner, leaving a gap between the symmetric training-time objective and the asymmetric verification-time reward. In this work, we offer an empirical analysis of three training-time interventions that narrow this gap: token positional weighting, a first-error focal loss that targets the position that breaks the accepted prefix within each block, and a chain loss term that substitutes a differentiable surrogate for the expected accepted length. The three interventions act along orthogonal axes (position, block-conditional first error, joint prefix) and compose additively; they are likewise orthogonal to test-time alignment mechanisms such as multi-draft self-selection, with which they can in principle be combined. Across four target models and six reasoning, code, and dialogue benchmarks, the three interventions raise accepted draft length by 21-76% per benchmark over a position-uniform baseline, without adding additional forward passes and without changing the inference pipeline or the rejection-sampling exactness contract.

翻译：大型语言模型在广泛任务上展现出卓越性能，但其自回归解码过程因固有的顺序标记生成机制导致高昂推理成本。投机解码通过引入轻量级草稿模型并行推测多个未来标记，再由大型目标模型并行验证，从而缓解这一瓶颈。近期研究表明，扩散语言模型天然适用于该场景，因其可并行生成整块草稿标记，消除自回归草稿的序列约束。该模式的关键在于：块扩散起草器在块内双向生成标记，而验证过程由严格从左到右评估标记的自回归目标模型执行，导致对称训练目标与非对称验证奖励之间存在偏差。本文对此偏差提出三项训练时干预措施的实证分析：标记位置加权、针对破坏块内已接受前缀位置的首次错误聚焦损失，以及用可微替代函数替代期望接受长度的链损失项。这三项干预措施沿正交维度（位置、块条件首次错误、联合前缀）发挥作用，且效果可叠加；它们与测试时对齐机制（如多草稿自选择）同样正交，原则上可结合使用。在四个目标模型和六个推理、代码及对话基准测试中，相比位置均匀基线，三项干预措施使每个基准测试的已接受草稿长度提升21-76%，且无需增加额外前向传播，不改变推理流程或拒绝采样的精确性约束。