The goal of data attribution is to trace model predictions back to training data. Despite a long line of work towards this goal, existing approaches to data attribution tend to force users to choose between computational tractability and efficacy. That is, computationally tractable methods can struggle with accurately attributing model predictions in non-convex settings (e.g., in the context of deep neural networks), while methods that are effective in such regimes require training thousands of models, which makes them impractical for large models or datasets. In this work, we introduce TRAK (Tracing with the Randomly-projected After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differentiable models. In particular, by leveraging only a handful of trained models, TRAK can match the performance of attribution methods that require training thousands of models. We demonstrate the utility of TRAK across various modalities and scales: image classifiers trained on ImageNet, vision-language models (CLIP), and language models (BERT and mT5). We provide code for using TRAK (and reproducing our work) at https://github.com/MadryLab/trak .
翻译:数据归因的目标是将模型预测追溯到训练数据。尽管相关工作已有较长历史,现有数据归因方法往往迫使用户在计算可行性与有效性之间做出取舍:计算高效的方法在非凸环境(如深度神经网络)下难以准确归因模型预测,而在此类场景中有效的方法需要训练数千个模型,这使其无法适用于大规模模型或数据集。本文提出TRAK(随机投影后核追踪法),这是一种针对大规模可微分模型的归因方法,兼具有效性与计算可行性。具体而言,该方法仅需利用少量已训练模型即可达到需要训练数千个模型的归因方法性能。我们在多种模态与规模上验证了TRAK的有效性:基于ImageNet的图像分类器、视觉语言模型(CLIP)以及语言模型(BERT与mT5)。相关代码(及实验复现)已开源至 https://github.com/MadryLab/trak。