The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes

Large-scale black-box models have become ubiquitous across numerous applications. Understanding the influence of individual training data sources on predictions made by these models is crucial for improving their trustworthiness. Current influence estimation techniques involve computing gradients for every training point or repeated training on different subsets. These approaches face obvious computational challenges when scaled up to large datasets and models. In this paper, we introduce and explore the Mirrored Influence Hypothesis, highlighting a reciprocal nature of influence between training and test data. Specifically, it suggests that evaluating the influence of training data on test predictions can be reformulated as an equivalent, yet inverse problem: assessing how the predictions for training samples would be altered if the model were trained on specific test samples. Through both empirical and theoretical validations, we demonstrate the wide applicability of our hypothesis. Inspired by this, we introduce a new method for estimating the influence of training data, which requires calculating gradients for specific test samples, paired with a forward pass for each training point. This approach can capitalize on the common asymmetry in scenarios where the number of test samples under concurrent examination is much smaller than the scale of the training dataset, thus gaining a significant improvement in efficiency compared to existing approaches. We demonstrate the applicability of our method across a range of scenarios, including data attribution in diffusion models, data leakage detection, analysis of memorization, mislabeled data detection, and tracing behavior in language models. Our code will be made available at https://github.com/ruoxi-jia-group/Forward-INF.

翻译：大规模黑盒模型已在众多应用场景中无处不在。理解个体训练数据源对这些模型预测结果的影响，对于提升其可信度至关重要。现有影响估计技术需要计算每个训练点的梯度或在不同数据子集上重复训练。当扩展到大规模数据集和模型时，这些方法面临显著的计算挑战。本文提出并探索了镜像影响假说，该假说揭示了训练数据与测试数据之间影响的互易特性。具体而言，它表明评估训练数据对测试预测的影响可以重新表述为一个等效而逆向的问题：若模型在特定测试样本上训练，评估训练样本的预测将如何改变。通过实证与理论验证，我们证明了该假说的广泛适用性。受此启发，我们提出了一种新的训练数据影响估计方法，该方法仅需计算特定测试样本的梯度，并配合每个训练点的前向传播。此方法能够利用测试样本数量远少于训练数据集规模的常见场景不对称性，从而相比现有方法获得显著的效率提升。我们在多种场景中验证了该方法的适用性，包括扩散模型中的数据归因、数据泄露检测、记忆化分析、错误标注数据检测以及语言模型中的行为溯源。我们的代码将在 https://github.com/ruoxi-jia-group/Forward-INF 公开。