With the rise of powerful closed-sourced LLMs (ChatGPT, GPT-4), there are increasing interests in distilling the capabilies of close-sourced LLMs to smaller open-sourced LLMs. Previous distillation methods usually prompt ChatGPT to generate a set of instructions and answers, for the student model to learn. However, such standard distillation approach neglects the merits and conditions of the student model. Inspired by modern teaching principles, we design a personalised distillation process, in which the student attempts to solve a task first, then the teacher provides an adaptive refinement for the student to improve. Instead of feeding the student with teacher's prior, personalised distillation enables personalised learning for the student model, as it only learns on examples it makes mistakes upon and learns to improve its own solution. On code generation, personalised distillation consistently outperforms standard distillation with only one third of the data. With only 2.5-3K personalised examples that incur a data-collection cost of 4-6$, we boost CodeGen-mono-16B by 7% to achieve 36.4% pass@1 and StarCoder by 12.2% to achieve 45.8% pass@1 on HumanEval.
翻译:随着闭源大语言模型(ChatGPT、GPT-4)的强大能力日益凸显,将闭源模型的能力蒸馏至更小规模的开源模型正引起广泛关注。传统蒸馏方法通常让ChatGPT生成一组指令及对应答案供学生模型学习,但这种标准化蒸馏过程忽视了学生模型自身的特性与状态。受现代教学原则启发,我们设计了一种个性化蒸馏方法:首先让学生模型自主尝试解决问题,再由教师模型提供自适应改进指导。不同于直接灌输教师知识,个性化蒸馏使学生模型仅学习其犯错的示例,并针对自身解法进行提升,从而实现个性化学习。在代码生成任务中,个性化蒸馏在仅使用三分之一数据量的情况下,始终优于标准蒸馏方法。仅需2.5-3K个个性化样本(数据采集成本约4-6美元),我们便将CodeGen-mono-16B在HumanEval上的pass@1指标提升7%至36.4%,将StarCoder提升12.2%至45.8%。