Large language models (LLMs) learn a vast amount of knowledge during pretraining, but they are often oblivious to the source(s) of such knowledge. We investigate the problem of intrinsic source citation, where LLMs are required to cite the pretraining source supporting a generated response. Intrinsic source citation can enhance LLM transparency, interpretability, and verifiability. To give LLMs such ability, we explore source-aware training -- a post pretraining recipe that involves (i) training the LLM to associate unique source document identifiers with the knowledge in each document, followed by (ii) an instruction-tuning to teach the LLM to cite a supporting pretraining source when prompted. Source-aware training can easily be applied to pretrained LLMs off the shelf, and diverges minimally from existing pretraining/fine-tuning frameworks. Through experiments on carefully curated data, we demonstrate that our training recipe can enable faithful attribution to the pretraining data without a substantial impact on the model's quality compared to standard pretraining. Our results also highlight the importance of data augmentation in achieving attribution. Code and data available here: \url{https://github.com/mukhal/intrinsic-source-citation}
翻译:大型语言模型(LLM)在预训练过程中学习到大量知识,但往往对其来源一无所知。本文研究内在源引用的归因问题,要求LLM在生成回答时引用支撑该回答的预训练数据源。内在源引用可提升LLM的透明度、可解释性和可验证性。为使LLM具备该能力,我们探索了源感知训练——一种预训练后处理方案,包含以下两个步骤:(i)训练LLM将独特文档标识符与各文档中的知识关联;(ii)通过指令微调让LLM在提示下能够引用支撑性预训练源。源感知训练可直接应用于现成的预训练LLM,且与现有预训练/微调框架的差异极小。通过在精心策划的数据集上进行实验,我们证明该训练方案能够在不大幅影响模型质量的前提下,实现对其预训练数据的可靠归因。实验结果还凸显了数据增强在实现归因过程中的关键作用。代码与数据详见:\url{https://github.com/mukhal/intrinsic-source-citation}