Influence functions aim to quantify the impact of individual training data points on a model's predictions. While extensive research has been conducted on influence functions in traditional machine learning models, their application to large language models (LLMs) has been limited. In this work, we conduct a systematic study to address a key question: do influence functions work on LLMs? Specifically, we evaluate influence functions across multiple tasks and find that they consistently perform poorly in most settings. Our further investigation reveals that their poor performance can be attributed to: (1) inevitable approximation errors when estimating the iHVP component due to the scale of LLMs, (2) uncertain convergence during fine-tuning, and, more fundamentally, (3) the definition itself, as changes in model parameters do not necessarily correlate with changes in LLM behavior. Our study thus suggests the need for alternative approaches for identifying influential samples. To support future work, our code is made available at https://github.com/plumprc/Failures-of-Influence-Functions-in-LLMs.
翻译:影响力函数旨在量化单个训练数据点对模型预测的影响。尽管在传统机器学习模型中已对影响力函数进行了广泛研究,但其在大型语言模型(LLMs)中的应用仍十分有限。本研究通过系统性实验探讨一个核心问题:影响力函数在LLMs中是否有效?具体而言,我们在多项任务中评估了影响力函数的表现,发现其在多数场景下均表现不佳。进一步研究表明,其性能不佳可归因于:(1)由于LLMs的规模导致估计逆海森向量积(iHVP)分量时不可避免的近似误差;(2)微调过程中的收敛不确定性;以及更根本性的(3)定义本身的问题,因为模型参数的变化并不必然与LLM行为变化相关联。因此,本研究指出需要开发新的方法来识别具有影响力的样本。为支持后续研究,相关代码已发布于https://github.com/plumprc/Failures-of-Influence-Functions-in-LLMs。