In this paper, we propose a novel data-pruning approach called moving-one-sample-out (MoSo), which aims to identify and remove the least informative samples from the training set. The core insight behind MoSo is to determine the importance of each sample by assessing its impact on the optimal empirical risk. This is achieved by measuring the extent to which the empirical risk changes when a particular sample is excluded from the training set. Instead of using the computationally expensive leaving-one-out-retraining procedure, we propose an efficient first-order approximator that only requires gradient information from different training stages. The key idea behind our approximation is that samples with gradients that are consistently aligned with the average gradient of the training set are more informative and should receive higher scores, which could be intuitively understood as follows: if the gradient from a specific sample is consistent with the average gradient vector, it implies that optimizing the network using the sample will yield a similar effect on all remaining samples. Experimental results demonstrate that MoSo effectively mitigates severe performance degradation at high pruning ratios and achieves satisfactory performance across various settings.
翻译:本文提出了一种名为“单样本排除”(Moving-one-Sample-out, MoSo)的新型数据剪枝方法,旨在识别并从训练集中移除信息量最小的样本。MoSo的核心思想是通过评估每个样本对最优经验风险的影响来确定其重要性,具体通过测量从训练集中排除特定样本时经验风险的变化程度来实现。为避免计算成本高昂的留一法重训练流程,我们提出了一种高效的一阶近似方法,该方法仅需利用不同训练阶段的梯度信息。近似方法的关键在于:梯度方向始终与训练集平均梯度方向一致的样本具有更高信息量,应获得更高评分。直观理解为:若特定样本的梯度与平均梯度向量一致,则意味着使用该样本优化网络将对所有剩余样本产生相似效果。实验结果表明,MoSo有效缓解了高剪枝率下的性能严重退化问题,并在多种设置下实现了令人满意的性能表现。