Random projections or sketches of gradients and Hessian vector products play an essential role in applications where one needs to store many such vectors while retaining accurate information about their relative geometry. Two important scenarios are training data attribution (tracing a model's behavior to the training data), where one needs to store a gradient for each training example, and the study of the spectrum of the Hessian (to analyze the training dynamics), where one needs to store multiple Hessian vector products. While sketches that use dense matrices are easy to implement, they are memory bound and cannot be scaled to modern neural networks. Motivated by work on the intrinsic dimension of neural networks, we propose and study a design space for scalable sketching algorithms. We demonstrate the efficacy of our approach in three applications: training data attribution, the analysis of the Hessian spectrum and the computation of the intrinsic dimension when fine-tuning pre-trained language models.
翻译:随机投影(即梯度和海森向量积的草图)在需要存储大量此类向量同时保留其相对几何信息的应用中发挥着关键作用。两个重要场景是:训练数据归因(将模型行为追溯至训练数据,需为每个训练样本存储梯度)以及海森矩阵谱分析(用于分析训练动态,需存储多个海森向量积)。尽管使用密集矩阵的草图实现简便,但其受限于内存,无法扩展至现代神经网络。受神经网络本征维度研究的启发,我们提出并探索了可扩展草图算法的设计空间。我们在三个应用中验证了该方法的效果:训练数据归因、海森矩阵谱分析以及微调预训练语言模型时的本征维度计算。