Large language models (LLMs) learn a vast amount of knowledge during pretraining, but they are often oblivious to the source(s) of such knowledge. We investigate the problem of intrinsic source citation, where LLMs are required to cite the pretraining source supporting a generated response. Intrinsic source citation can enhance LLM transparency, interpretability, and verifiability. To give LLMs such ability, we explore source-aware training -- a recipe that involves (i) training the LLM to associate unique source document identifiers with the knowledge in each document, followed by (ii) an instruction-tuning stage to teach the LLM to cite a supporting pretraining source when prompted. Source-aware training borrows from existing pretraining/fine-tuning frameworks and requires minimal changes to the model architecture or implementation. Through experiments on synthetic data, we demonstrate that our training recipe can enable faithful attribution to the pretraining data without a substantial impact on the model's perplexity compared to standard pretraining. Our findings also highlight the importance of pretraining data augmentation in achieving attribution. Code and data available here: \url{https://github.com/mukhal/intrinsic-source-citation}
翻译:大型语言模型(LLM)在预训练过程中学习了海量知识,但它们通常无法识别这些知识的来源。本文研究内在源引用问题,即要求LLM在生成回答时引用支撑该回答的预训练来源。内在源引用能够增强LLM的透明度、可解释性与可验证性。为赋予LLM这种能力,我们探索源感知训练方法——该方案包含两个阶段:(i)训练LLM将独特源文档标识符与每个文档中的知识相关联;(ii)通过指令微调阶段教导LLM在收到提示时引用支持的预训练来源。源感知训练借鉴了现有的预训练/微调框架,仅需对模型架构或实现进行最小改动。通过在合成数据上的实验,我们证明相较于标准预训练,该训练方案能在不明显影响模型困惑度的前提下,实现对预训练数据的可靠溯源。我们的研究结果还凸显了预训练数据增强对于实现溯源能力的重要性。代码与数据详见:\url{https://github.com/mukhal/intrinsic-source-citation}