Despite the remarkable performance of large language models (LLMs) in recent NLP tasks, their deployment poses substantial challenges due to high computational and memory demands. Recent research has concentrated on improving open-source smaller models through knowledge distillation from LLMs to reduce computational resource costs with promising outcomes. Nevertheless, they frequently fall short of attaining LLM-level performance, particularly in tasks demanding advanced reasoning. In this work, we introduce the \textbf{Mixed Distillation} framework, which capitalizes on the strengths of Program-of-Thought (PoT) and Chain-of-Thought (CoT) capabilities within LLMs and distills these capabilities to smaller models. Regarding these two capabilities, the PoT is dedicated to enhancing the performance of reasoning results generated by smaller models, while CoT simultaneously optimizes the results. Our Mixed Distillation framework offers a promising approach to enhance the capabilities of smaller models, bridging the gap with LLMs, and demonstrating better performance across various tasks. Specifically, on the SVAMP dataset, employing a 7 billion parameter Llama2 and CodeLlama in a mixed distillation framework not only boosts distillation capabilities beyond single-path distillation methods but also outperforms the LLM (GPT-3.5-turbo) in terms of reasoning accuracy. Through sampling in multiple-path reasoning, the models achieve impressive accuracy performances of 85% and 85.5%, respectively, signifying advancements over previous distillation methods.
翻译:尽管大型语言模型(LLM)在近期自然语言处理任务中展现出卓越性能,但其部署因高昂的计算和内存需求而面临重大挑战。当前研究主要集中于通过从LLM进行知识蒸馏来改进开源小型模型,以降低计算资源成本并取得了良好效果。然而,这些模型在需要高级推理的任务中往往难以达到LLM级别的性能。本文提出\textbf{混合蒸馏}框架,该框架利用LLM中思维程序(PoT)和思维链(CoT)能力的优势,并将这些能力蒸馏至小型模型。针对这两种能力,PoT致力于提升小型模型生成的推理结果性能,而CoT则同时对结果进行优化。我们的混合蒸馏框架为增强小型模型能力提供了一条有前景的路径,缩小其与LLM间的差距,并在多种任务中展现出更优性能。具体而言,在SVAMP数据集上,采用70亿参数的Llama2和CodeLlama进行混合蒸馏,不仅使蒸馏能力超越单路径蒸馏方法,还在推理准确率上优于LLM(GPT-3.5-turbo)。通过多路径推理中的采样,模型分别实现了85%和85.5%的出色准确率,标志着相较于以往蒸馏方法的显著进步。