The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes

Large-scale black-box models have become ubiquitous across numerous applications. Understanding the influence of individual training data sources on predictions made by these models is crucial for improving their trustworthiness. Current influence estimation techniques involve computing gradients for every training point or repeated training on different subsets. These approaches face obvious computational challenges when scaled up to large datasets and models. In this paper, we introduce and explore the Mirrored Influence Hypothesis, highlighting a reciprocal nature of influence between training and test data. Specifically, it suggests that evaluating the influence of training data on test predictions can be reformulated as an equivalent, yet inverse problem: assessing how the predictions for training samples would be altered if the model were trained on specific test samples. Through both empirical and theoretical validations, we demonstrate the wide applicability of our hypothesis. Inspired by this, we introduce a new method for estimating the influence of training data, which requires calculating gradients for specific test samples, paired with a forward pass for each training point. This approach can capitalize on the common asymmetry in scenarios where the number of test samples under concurrent examination is much smaller than the scale of the training dataset, thus gaining a significant improvement in efficiency compared to existing approaches. We demonstrate the applicability of our method across a range of scenarios, including data attribution in diffusion models, data leakage detection, analysis of memorization, mislabeled data detection, and tracing behavior in language models. Our code will be made available at https://github.com/ruoxi-jia-group/Forward-INF.

翻译：大规模黑盒模型已在众多应用中无所不在。理解单个训练数据源对这些模型预测的影响，对于提升其可信度至关重要。当前的影响估计技术涉及为每个训练点计算梯度，或在不同子集上重复训练。这些方法在扩展到大型数据集和模型时面临明显的计算挑战。在本文中，我们提出并探索了镜像影响假说，揭示了训练数据与测试数据之间影响关系的互惠本质。具体而言，该假说表明，评估训练数据对测试预测的影响可以被重新表述为一个等价但相反的问题：评估如果模型在特定测试样本上训练，训练样本的预测将如何改变。通过实证和理论验证，我们论证了该假说的广泛适用性。受此启发，我们提出了一种新的训练数据影响估计方法，该方法需要计算特定测试样本的梯度，并配合每个训练点的前向传播。这种方法可以利用常见的不对称性，即同时考察的测试样本数量远小于训练数据集的规模，从而相比现有方法在效率上取得显著提升。我们展示了该方法在多种场景中的适用性，包括扩散模型中的数据归因、数据泄露检测、记忆化分析、错误标注数据检测以及语言模型中的行为追踪。我们的代码将在 https://github.com/ruoxi-jia-group/Forward-INF 提供。