Speculative decoding must produce outputs distribution identical to standard autoregressive generation-this output equivalence is not an optimization target but the defining criterion of valid speculative decoding. We demonstrate that all existing batch speculative decoding implementations violate this fundamental requirement, producing corrupted outputs ranging from repetitive tokens to gibberish. These failures stem from the ragged tensor problem: sequences in the same batch accept different numbers of draft tokens, desynchronizing position IDs, attention masks, and KV-cache state. We present the first authentic batch speculative decoding framework. We (1) formalize the synchronization invariants that valid batch speculative decoding must satisfy, (2) present EQSPEC, the first algorithm that guarantees output equivalence, and analyze its cost structure to show that alignment overhead grows superlinearly and consumes up to 40\% of computation, and (3) introduce EXSPEC, which reduces this overhead through cross-batch scheduling that dynamically groups same-length sequences. On SpecBench across Vicuna-7B/68M, Qwen3-8B/0.6B, and GLM-4-9B/0.6B pairs, our methods achieve up to 3x throughput improvement at batch size 8 while maintaining algorithmic correctness. Our methods achieve 95\% decoding-equivalence, with residual divergence attributable to floating-point non-determinism in GPU inference, not the synchronization failures that cause near-zero equivalence of prior methods. Our code is available at https://github.com/eBay/spec_dec.
翻译:推测解码必须产生与标准自回归生成完全相同的输出分布——这种输出等价性并非优化目标,而是有效推测解码的定义准则。我们证明所有现有的批处理推测解码实现都违反了这一基本要求,产生从重复标记到乱码的损坏输出。这些失败源于参差张量问题:同一批次中的序列接受不同数量的草稿标记,导致位置ID、注意力掩码和KV缓存状态失同步。我们提出了首个真正意义上的批处理推测解码框架。我们(1)形式化了有效批处理推测解码必须满足的同步不变条件,(2)提出首个保证输出等价性的算法EQSPEC,并通过分析其成本结构证明对齐开销呈超线性增长且最高消耗40%的计算资源,以及(3)引入EXSPEC,该方案通过动态分组等长序列的跨批次调度来降低此开销。在Vicuna-7B/68M、Qwen3-8B/0.6B和GLM-4-9B/0.6B模型对的SpecBench测试中,我们的方法在批次大小为8时实现了最高3倍的吞吐量提升,同时保持算法正确性。我们的方法达到95%的解码等价性,残余差异可归因于GPU推理中的浮点数非确定性,而非导致先前方法接近零等价性的同步失效问题。代码已开源:https://github.com/eBay/spec_dec。