We study gradient-based data attribution, aiming to identify which training examples most influence a given output. Existing methods for this task either treat network parameters uniformly or rely on implicit weighting derived from Hessian approximations, which do not fully model functional heterogeneity of network parameters. To address this, we propose a method to explicitly learn parameter importance weights directly from data, without requiring annotated labels. Our approach improves attribution accuracy across diverse tasks, including image classification, language modeling, and diffusion, and enables fine-grained attribution for concepts like subject and style.
翻译:我们研究基于梯度的数据归因方法,旨在识别哪些训练样本对给定输出影响最大。该任务的现有方法要么统一处理网络参数,要么依赖于从Hessian近似推导出的隐式加权,这些方法未能充分建模网络参数的功能异质性。为解决此问题,我们提出一种直接从数据中显式学习参数重要性权重的方法,无需标注标签。我们的方法在多种任务中提升了归因准确性,包括图像分类、语言建模和扩散模型,并实现了对主题和风格等概念的细粒度归因。