Training data attribution (TDA) methods aim to attribute model outputs back to specific training examples, and the application of these methods to large language model (LLM) outputs could significantly advance model transparency and data curation. However, it has been challenging to date to apply these methods to the full scale of LLM pretraining. In this paper, we refine existing gradient-based methods to work effectively at scale, allowing us to retrieve influential examples for an 8B-parameter language model from a pretraining corpus of over 160B tokens with no need for subsampling or pre-filtering. Our method combines several techniques, including optimizer state correction, a task-specific Hessian approximation, and normalized encodings, which we find to be critical for performance at scale. In quantitative evaluations on a fact tracing task, our method performs best at identifying examples that influence model predictions, but classical, model-agnostic retrieval methods such as BM25 still perform better at finding passages which explicitly contain relevant facts. These results demonstrate a misalignment between factual *attribution* and causal *influence*. With increasing model size and training tokens, we find that influence more closely aligns with factual attribution. Finally, we examine different types of examples identified as influential by our method, finding that while many directly entail a particular fact, others support the same output by reinforcing priors on relation types, common entities, and names. We release our prompt set and model outputs, along with a web-based visualization tool to explore influential examples for factual predictions, commonsense reasoning, arithmetic, and open-ended generation for an 8B-parameter LLM.
翻译:训练数据归因(TDA)方法旨在将模型输出归因于特定的训练样本,将这些方法应用于大语言模型(LLM)输出可显著提升模型透明度和数据管理能力。然而,迄今为止将这些方法应用于LLM预训练的全规模数据集仍具挑战性。本文通过改进现有基于梯度的方法,使其能够在大规模场景下有效工作,使我们能够从一个超过160B词元的预训练语料库中,为80亿参数的语言模型检索出有影响力的样本,无需进行子采样或预过滤。我们的方法结合了多种技术,包括优化器状态校正、任务特定的Hessian矩阵近似以及归一化编码,这些技术被证明对大规模性能至关重要。在事实溯源任务的定量评估中,我们的方法在识别影响模型预测的样本方面表现最佳,但经典的、与模型无关的检索方法(如BM25)在查找明确包含相关事实的段落方面仍表现更优。这些结果表明事实*归因*与因果*影响力*之间存在错位。随着模型规模和训练词元的增加,我们发现影响力与事实归因的匹配度更高。最后,我们分析了通过本方法识别出的不同类型的有影响力样本,发现虽然许多样本直接蕴含特定事实,但其他样本通过强化关系类型、常见实体和名称的先验知识来支持相同输出。我们发布了提示集和模型输出,以及一个基于网络的可视化工具,用于探索80亿参数LLM在事实预测、常识推理、算术和开放式生成任务中的有影响力样本。