Transferring the reasoning capability from stronger large language models (LLMs) to smaller ones has been quite appealing, as smaller LLMs are more flexible to deploy with less expense. Among the existing solutions, knowledge distillation stands out due to its outstanding efficiency and generalization. However, existing methods suffer from several drawbacks, including limited knowledge diversity and the lack of rich contextual information. To solve the problems and facilitate the learning of compact language models, we propose TinyLLM, a novel knowledge distillation paradigm to learn a small student LLM from multiple large teacher LLMs. In particular, we encourage the student LLM to not only generate the correct answers but also understand the rationales behind these answers. Given that different LLMs possess diverse reasoning skills, we guide the student model to assimilate knowledge from various teacher LLMs. We further introduce an in-context example generator and a teacher-forcing Chain-of-Thought strategy to ensure that the rationales are accurate and grounded in contextually appropriate scenarios. Extensive experiments on six datasets across two reasoning tasks demonstrate the superiority of our method. Results show that TinyLLM can outperform large teacher LLMs significantly, despite having a considerably smaller model size.
翻译:将较强大型语言模型的推理能力迁移至较小模型具有显著吸引力,因为小型LLM部署更灵活且成本更低。在现有解决方案中,知识蒸馏凭借其卓越的效率和泛化能力脱颖而出。然而,现有方法存在若干缺陷,包括知识多样性有限以及缺乏丰富的上下文信息。为解决这些问题并促进紧凑型语言模型的学习,我们提出TinyLLM——一种从多个大型教师LLM中学习小型学生LLM的新型知识蒸馏范式。具体而言,我们鼓励学生LLM不仅生成正确答案,还要理解这些答案背后的推理逻辑。鉴于不同LLM具备多样的推理技能,我们引导学生模型吸收来自多个教师LLM的知识。我们进一步引入上下文示例生成器和教师强制思维链策略,确保推理过程的准确性和情境适切性。在涉及两类推理任务的六个数据集上的广泛实验表明了我们方法的优越性。结果显示,TinyLLM虽然模型规模显著较小,但其性能可大幅超越大型教师LLM。