Knowledge distillation from LLMs is essential for the efficient deployment of language models. Prior works have proposed data generation using LLMs for preparing distilled models. We argue that generating data with LLMs is prone to sampling mainly from the center of original content distribution. This limitation hinders the distilled model from learning the true underlying data distribution and to forget the tails of the distributions (samples with lower probability). To this end, we propose GOLD, a task-agnostic data generation and knowledge distillation framework, which employs an iterative out-of-distribution-guided feedback mechanism for the LLM. As a result, the generated data improves the generalizability of distilled models. An energy-based OOD evaluation approach is also introduced to deal with noisy generated data. Our extensive experiments on 10 different classification and sequence-to-sequence tasks in NLP show that GOLD respectively outperforms prior arts and the LLM with an average improvement of 5% and 14%. We will also show that the proposed method is applicable to less explored and novel tasks. The code is available.
翻译:从大型语言模型(LLMs)进行知识蒸馏对于语言模型的高效部署至关重要。现有工作已提出利用LLMs生成数据以制备蒸馏模型。我们认为,使用LLMs生成数据容易主要从原始内容分布的中心区域采样。这一局限阻碍了蒸馏模型学习真实的数据底层分布,并导致其遗忘分布尾部(低概率样本)。为此,我们提出GOLD——一种任务无关的数据生成与知识蒸馏框架,该框架为LLM引入迭代的分布外引导反馈机制。由此生成的数据增强了蒸馏模型的泛化能力。同时,我们引入基于能量的分布外评估方法以处理含噪生成数据。在NLP领域的10种分类与序列到序列任务上的广泛实验表明,GOLD相较于现有最优方法及LLM分别实现了平均5%和14%的性能提升。我们还将证明该方法适用于探索较少及新颖的任务。代码已公开。