Speculative decoding has demonstrated its effectiveness in accelerating the inference of large language models while maintaining a consistent sampling distribution. However, the conventional approach of training a separate draft model to achieve a satisfactory token acceptance rate can be costly. Drawing inspiration from early exiting, we propose a novel self-speculative decoding framework \emph{Kangaroo}, which uses a fixed shallow sub-network as a self-draft model, with the remaining layers serving as the larger target model. We train a lightweight and efficient adapter module on top of the sub-network to bridge the gap between the sub-network and the full model's representation ability. It is noteworthy that the inference latency of the self-draft model may no longer be negligible compared to the large model, necessitating strategies to increase the token acceptance rate while minimizing the drafting steps of the small model. To address this challenge, we introduce an additional early exiting mechanism for generating draft tokens. Specifically, we halt the small model's subsequent prediction during the drafting phase once the confidence level for the current token falls below a certain threshold. Extensive experiments on the Spec-Bench demonstrate the effectiveness of Kangaroo. Under single-sequence verification, Kangaroo achieves speedups up to $1.68\times$ on Spec-Bench, outperforming Medusa-1 with 88.7\% fewer additional parameters (67M compared to 591M). The code for Kangaroo is available at https://github.com/Equationliu/Kangaroo.
翻译:推测解码在保持采样分布一致性的同时,已被证实能有效加速大型语言模型的推理过程。然而,传统方法需要训练独立的草稿模型以实现可观的令牌接受率,这通常代价高昂。受早退思想启发,我们提出一种新颖的自推测解码框架——Kangaroo,通过固定浅层子网络作为自草稿模型,而剩余层作为更大的目标模型。我们在子网络上训练一个轻量高效的适配器模块,以弥合子网络与完整模型表示能力之间的差距。值得注意的是,自草稿模型的推理延迟相较于大型模型已不可忽视,因此需要制定策略在提升令牌接受率的同时减少小型模型的草稿生成步骤。为应对这一挑战,我们引入了额外的早退机制来生成草稿令牌。具体而言,在草稿生成阶段,一旦当前令牌的置信度低于预设阈值,我们即终止小型模型的后续预测。在Spec-Bench上的大量实验证明了Kangaroo的有效性。在单序列验证条件下,Kangaroo在Spec-Bench上实现了最高$1.68\times$的加速比,同时以较Medusa-1减少88.7%的额外参数(6700万对比5.91亿)取得更优性能。Kangaroo代码已开源至https://github.com/Equationliu/Kangaroo。