Pre-trained large language models (LLMs) are commonly fine-tuned to adapt to downstream tasks. Since the majority of knowledge is acquired during pre-training, attributing the predictions of fine-tuned LLMs to their pre-training data may provide valuable insights. Influence functions have been proposed as a means to explain model predictions based on training data. However, existing approaches fail to compute ``multi-stage'' influence and lack scalability to billion-scale LLMs. In this paper, we propose the multi-stage influence function to attribute the downstream predictions of fine-tuned LLMs to pre-training data under the full-parameter fine-tuning paradigm. To enhance the efficiency and practicality of our multi-stage influence function, we leverage Eigenvalue-corrected Kronecker-Factored (EK-FAC) parameterization for efficient approximation. Empirical results validate the superior scalability of EK-FAC approximation and the effectiveness of our multi-stage influence function. Additionally, case studies on a real-world LLM, dolly-v2-3b, demonstrate its interpretive power, with exemplars illustrating insights provided by multi-stage influence estimates. Our code is public at https://github.com/colored-dye/multi_stage_influence_function.
翻译:预训练大语言模型通常通过微调来适应下游任务。由于大部分知识在预训练阶段获得,将微调后LLM的预测归因于其预训练数据可能提供有价值的洞见。影响函数已被提出作为基于训练数据解释模型预测的方法。然而,现有方法无法计算"多阶段"影响,且缺乏面向十亿级LLM的可扩展性。本文提出多阶段影响函数,在全参数微调范式下将微调后LLM的下游预测归因于预训练数据。为提升多阶段影响函数的效率与实用性,我们采用特征值校正克罗内克分解参数化进行高效近似。实证结果验证了EK-FAC近似的卓越可扩展性及多阶段影响函数的有效性。此外,在真实LLM模型dolly-v2-3b上的案例研究展示了其解释能力,示例说明了多阶段影响估计所提供的洞见。代码公开于https://github.com/colored-dye/multi_stage_influence_function。