Large-scale black-box models have become ubiquitous across numerous applications. Understanding the influence of individual training data sources on predictions made by these models is crucial for improving their trustworthiness. Current influence estimation techniques involve computing gradients for every training point or repeated training on different subsets. These approaches face obvious computational challenges when scaled up to large datasets and models. In this paper, we introduce and explore the Mirrored Influence Hypothesis, highlighting a reciprocal nature of influence between training and test data. Specifically, it suggests that evaluating the influence of training data on test predictions can be reformulated as an equivalent, yet inverse problem: assessing how the predictions for training samples would be altered if the model were trained on specific test samples. Through both empirical and theoretical validations, we demonstrate the wide applicability of our hypothesis. Inspired by this, we introduce a new method for estimating the influence of training data, which requires calculating gradients for specific test samples, paired with a forward pass for each training point. This approach can capitalize on the common asymmetry in scenarios where the number of test samples under concurrent examination is much smaller than the scale of the training dataset, thus gaining a significant improvement in efficiency compared to existing approaches. We demonstrate the applicability of our method across a range of scenarios, including data attribution in diffusion models, data leakage detection, analysis of memorization, mislabeled data detection, and tracing behavior in language models. Our code will be made available at https://github.com/ruoxi-jia-group/Forward-INF.
翻译:大规模黑盒模型已在众多应用中变得无处不在。理解单个训练数据源对这些模型预测结果的影响,对于提升其可信度至关重要。当前的影响评估技术需要为每个训练点计算梯度,或在不同子集上重复训练。这些方法在扩展到大规模数据集和模型时面临显著的计算挑战。本文引入并探索了镜像影响假说,揭示了训练数据与测试数据之间影响的互惠特性。具体而言,该假说表明评估训练数据对测试预测的影响可重新表述为一个等价的反问题:若模型在特定测试样本上训练,训练样本的预测结果将如何改变。通过实证与理论验证,我们展示了该假说的广泛适用性。受此启发,我们提出了一种新的训练数据影响估计方法,该方法需要为特定测试样本计算梯度,并配合每个训练点的前向传播。该方法可利用常见的不对称场景——被并行分析的测试样本数量远小于训练数据集的规模——从而相较现有方法获得显著的效率提升。我们展示了该方法在多种场景中的适用性,包括扩散模型中的数据归因、数据泄露检测、记忆化分析、错误标记数据检测以及语言模型中的行为追踪。我们的代码将在 https://github.com/ruoxi-jia-group/Forward-INF 公开。