The emergence of Large Language Models (LLMs) has brought to light promising language generation capabilities, particularly in performing tasks like complex reasoning and creative writing. Consequently, distillation through imitation of teacher responses has emerged as a popular technique to transfer knowledge from LLMs to more accessible, Small Language Models (SLMs). While this works well for simpler tasks, there is a substantial performance gap on tasks requiring intricate language comprehension and creativity, such as humor generation. We hypothesize that this gap may stem from the fact that creative tasks might be hard to learn by imitation alone and explore whether an approach, involving supplementary guidance from the teacher, could yield higher performance. To address this, we study the effect of assigning a dual role to the LLM - as a "teacher" generating data, as well as a "critic" evaluating the student's performance. Our experiments on humor generation reveal that the incorporation of feedback significantly narrows the performance gap between SLMs and their larger counterparts compared to merely relying on imitation. As a result, our research highlights the potential of using feedback as an additional dimension to data when transferring complex language abilities via distillation.
翻译:大型语言模型(LLMs)的出现展示了其卓越的语言生成能力,尤其在复杂推理与创意写作等任务中。因此,通过模仿教师模型响应进行知识蒸馏已成为一种流行技术,用于将知识从LLMs迁移至更易获取的小型语言模型(SLMs)。虽然该方法对简单任务效果良好,但在涉及复杂语言理解与创造力的任务(如幽默生成)中仍存在显著性能差距。我们假设这一差距可能源于创造性任务难以仅通过模仿学习,并探索能否通过教师模型提供的额外指导来提升性能。为此,我们研究赋予LLM双重角色——既是生成数据的"教师",又是评估学生表现的"评论家"——所带来的影响。针对幽默生成的实验表明,相比仅依赖模仿,引入反馈显著缩小了SLMs与大型模型之间的性能差距。我们的研究凸显了在通过蒸馏迁移复杂语言能力时,将反馈作为数据补充维度的潜力。