In this paper, we propose a novel LLM-Neo framework that efficiently transfers knowledge from a large language model (LLM) teacher to a compact student. Initially, we revisit the knowledge distillation (KD) and low-rank adaption (LoRA), and argue that they share the same paradigm. Inspired by this observation, we explore the strategy that combines LoRA and KD to enhance the efficiency of knowledge transfer. We first summarize some guidelines for this design and further develop the LLM-Neo. Experimental results on compressing Llama 2 and Llama 3 show that LLM-Neo outperforms various baselines. Further analysis demonstrates the robustness of the proposed LLM-Neo on variants of LoRA. The trained models have been available at \href{https://huggingface.co/collections/yang31210999/llm-neo-66e3c882f5579b829ff57eba}{this repository}.
翻译:本文提出了一种新颖的LLM-Neo框架,旨在高效地将大语言模型(LLM)教师的知识迁移至紧凑的学生模型中。我们首先重新审视了知识蒸馏(KD)与低秩自适应(LoRA),并论证了二者共享相同的范式。受此启发,我们探索了结合LoRA与KD的策略以提升知识迁移效率。我们首先总结了该设计的一些指导原则,并在此基础上进一步开发了LLM-Neo。在Llama 2和Llama 3模型压缩任务上的实验结果表明,LLM-Neo优于多种基线方法。进一步的分析验证了所提出的LLM-Neo在多种LoRA变体上的鲁棒性。训练好的模型已发布于\href{https://huggingface.co/collections/yang31210999/llm-neo-66e3c882f5579b829ff57eba}{此代码库}。