Speculative Decoding (SD) is a key technique for accelerating Large Language Model (LLM) inference, but it typically requires training a draft model on a large dataset. We approach this problem from a data-centric perspective, finding that not all training samples contribute equally to the SD acceptance rate. Specifically, our theoretical analysis and empirical validation reveals that tokens inducing flatter predictive distributions from the target model are more valuable than those yielding sharply peaked distributions. Based on this insight, we propose flatness, a new metric to quantify this property, and develop the Sample-level-flatness-based Dataset Distillation (SFDD) approach, which filters the training data to retain only the most valuable samples. Experiments on the EAGLE framework demonstrate that SFDD can achieve over 2$\times$ training speedup using only 50% of the data, while keeping the final model's inference speedup within 4% of the full-dataset baseline. This work introduces an effective, data-centric approach that substantially improves the training efficiency for Speculative Decoding. Our code is available at https://github.com/fjm9933/Flatness.
翻译:推测解码(Speculative Decoding, SD)是加速大语言模型(LLM)推理的关键技术,但其通常需要在大型数据集上训练一个草稿模型。我们从数据中心的视角切入该问题,发现并非所有训练样本对SD接受率的贡献均等。具体而言,我们的理论分析和实证验证表明,相较于产生尖锐峰值分布的标记,那些能诱导目标模型产生更平坦预测分布的标记更具价值。基于这一洞见,我们提出了平坦度(flatness)这一新指标来量化该特性,并开发了基于样本级平坦度的数据集精馏(Sample-level-flatness-based Dataset Distillation, SFDD)方法,该方法通过过滤训练数据以仅保留最有价值的样本。在EAGLE框架上的实验表明,SFDD仅使用50%的数据即可实现超过2倍的训练加速,同时将最终模型的推理加速保持在完整数据集基线的4%以内。这项工作提出了一种有效的数据中心方法,显著提升了推测解码的训练效率。我们的代码公开于 https://github.com/fjm9933/Flatness。