Gradient-based data attribution methods, such as influence functions, are critical for understanding the impact of individual training samples without requiring repeated model retraining. However, their scalability is often limited by the high computational and memory costs associated with per-sample gradient computation. In this work, we propose GraSS, a novel gradient compression algorithm and its variants FactGraSS for linear layers specifically, that explicitly leverage the inherent sparsity of per-sample gradients to achieve sub-linear space and time complexity. Extensive experiments demonstrate the effectiveness of our approach, achieving substantial speedups while preserving data influence fidelity. In particular, FactGraSS achieves up to 165% faster throughput on billion-scale models compared to the previous state-of-the-art baselines. Our code is publicly available at https://github.com/TRAIS-Lab/GraSS.
翻译:基于梯度的数据归因方法(如影响函数)对于理解个体训练样本的影响至关重要,无需重复进行模型重训练。然而,其可扩展性常受限于与逐样本梯度计算相关的高计算和内存成本。本研究提出GraSS——一种新颖的梯度压缩算法及其针对线性层的变体FactGraSS,该方法显式利用逐样本梯度的固有稀疏性,实现亚线性的空间与时间复杂度。大量实验证明了本方法的有效性,在保持数据影响保真度的同时实现了显著的加速效果。特别地,与先前最先进的基线方法相比,FactGraSS在十亿级规模模型上实现了高达165%的吞吐量提升。我们的代码已公开于https://github.com/TRAIS-Lab/GraSS。