Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning

As supervised fine-tuning (SFT) evolves from a lightweight post-training step into a compute-intensive phase rivaling mid-training in scale, data efficiency has become critical for aligning large language models (LLMs) under tight budgets. Existing data pruning methods suffer from a fragmented design: they operate either at the sample level or the token level in isolation, failing to jointly optimize both dimensions. This disconnect leads to significant inefficiencies--high-value samples may still contain redundant tokens, while token-level pruning often discards crucial instructional or corrective signals embedded in individual examples. To address this bottleneck, we introduce the Error-Uncertainty (EU) Plane, a diagnostic framework that jointly characterizes the heterogeneous utility of training data across samples and tokens. Guided by this insight, we propose Quadrant-based Tuning (Q-Tuning), a unified framework that strategically coordinates sample pruning and token pruning. Q-Tuning employs a two-stage strategy: first, it performs sample-level triage to retain examples rich in informative misconceptions or calibration signals; second, it applies an asymmetric token-pruning policy, using a context-aware scoring mechanism to trim less salient tokens exclusively from misconception samples while preserving calibration samples in their entirety. Our method sets a new state of the art across five diverse benchmarks. Remarkably, on SmolLM2-1.7B, Q-Tuning achieves a +38\% average improvement over the full-data SFT baseline using only 12.5\% of the original training data. As the first dynamic pruning approach to consistently outperform full-data training, Q-Tuning provides a practical and scalable blueprint for maximizing data utilization in budget-constrained LLM SFT.

翻译：随着监督微调（SFT）从一个轻量级的后训练步骤演变为一个计算密集型阶段，其规模可与中期训练相媲美，在预算紧张的情况下，数据效率对于对齐大型语言模型（LLM）变得至关重要。现有的数据剪枝方法存在设计上的割裂：它们要么孤立地在样本层面操作，要么在标记层面操作，未能联合优化这两个维度。这种脱节导致了显著的效率低下——高价值样本可能仍包含冗余标记，而标记级剪枝往往会丢弃嵌入在单个示例中的关键指令或校正信号。为了解决这一瓶颈，我们引入了误差-不确定性（EU）平面，这是一个诊断框架，可联合表征训练数据在样本和标记层面的异质效用。基于这一洞见，我们提出了基于象限的微调（Q-Tuning），一个统一框架，用于战略性地协调样本剪枝和标记剪枝。Q-Tuning采用两阶段策略：首先，执行样本级分类以保留富含信息性误解或校准信号的示例；其次，应用非对称标记剪枝策略，使用上下文感知的评分机制，专门从误解样本中修剪显著性较低的标记，同时完整保留校准样本。我们的方法在五个不同的基准测试中均达到了新的最优水平。值得注意的是，在SmolLM2-1.7B上，Q-Tuning仅使用原始训练数据的12.5%，就比全数据SFT基线平均提升了+38%。作为首个持续超越全数据训练的动态剪枝方法，Q-Tuning为在预算受限的LLM SFT中最大化数据利用率提供了一个实用且可扩展的蓝图。