In many practical applications, large language models (LLMs) need to incorporate new knowledge not present in their pre-training data. The primary methods for this are fine-tuning and retrieval-augmented generation (RAG). Although RAG has emerged as the industry standard for knowledge injection, fine-tuning has not yet achieved comparable success. In this paper, we propose a new fine-tuning technique for learning new knowledge and show that it can reach the performance of RAG. The proposed method is based on the self-distillation approach, which we call prompt distillation. First, we generate question-answer pairs about the new knowledge. Then, we fine-tune a student model on the question-answer pairs to imitate the output distributions of a teacher model, which additionally receives the new knowledge in its prompt. The student model is identical to the teacher, except it is equipped with a LoRA adapter. This training procedure facilitates distilling the new knowledge from the teacher's prompt into the student's weights.
翻译:在许多实际应用中,大型语言模型(LLM)需要整合其预训练数据中不存在的新知识。实现这一目标的主要方法是微调与检索增强生成(RAG)。尽管RAG已成为知识注入的行业标准方法,但微调尚未取得可与之媲美的成功。本文提出一种用于学习新知识的微调新技术,并证明其性能可达到RAG的水平。该方法基于自蒸馏思想,我们称之为提示蒸馏。首先,我们生成关于新知识的问题-答案对。接着,我们利用这些问答对微调一个学生模型,使其模仿教师模型的输出分布,而教师模型在提示中额外接收了新知识。学生模型与教师模型结构相同,但配备了LoRA适配器。该训练过程有助于将教师提示中的新知识蒸馏至学生模型的权重中。