Judge Decoding accelerates LLM inference by relaxing the strict verification of Speculative Decoding, yet it typically relies on expensive and noisy supervision. In this work, we revisit this paradigm from first principles, revealing that the ``criticality'' scores learned via costly supervision are intrinsically encoded in the draft-target distributional divergence. We theoretically prove a structural correspondence between learned linear judges and Kullback-Leibler (KL) divergence, demonstrating they rely on the same underlying logit primitives. Guided by this, we propose a simple, training-free verification mechanism based on KL divergence. Extensive experiments across reasoning and coding benchmarks show that our method matches or outperforms complex trained judges (e.g., AutoJudge), offering superior robustness to domain shifts and eliminating the supervision bottleneck entirely.
翻译:Judge Decoding通过放宽Speculative Decoding的严格验证机制来加速大语言模型推理,但其通常依赖昂贵且带噪声的监督信号。本研究从第一性原理重新审视该范式,揭示通过高成本监督习得的“关键性”评分本质上编码于草稿模型与目标模型的分布散度中。我们理论证明了线性学习型judge与Kullback-Leibler(KL)散度间的结构对应关系,表明二者依赖相同的底层logit基元。基于此发现,我们提出一种基于KL散度的简单免训练验证机制。在推理与代码生成基准上的大量实验表明,该方法在性能上匹配甚至超越复杂训练型judge(如AutoJudge),对领域偏移具有更强的鲁棒性,并彻底消除了监督学习的瓶颈。