Large language models (LLMs) with instruction finetuning demonstrate superior generative capabilities. However, these models are resource intensive. To alleviate this issue, we explore distilling knowledge from instruction-tuned LLMs to much smaller ones. To this end, we carefully develop a large set of 2.58M instructions based on both existing and newly-generated instructions. In addition to being sizeable, we design our instructions to cover a broad set of topics to ensure. A thorough investigation of our instruction data demonstrate their diversity, and we generate responses for these instructions using gpt-3.5-turbo. We then exploit the instructions to tune a host of models, dubbed LaMini-LM, of varying sizes, both from the encoder-decoder as well as the decoder-only families. We evaluate our models both automatically (on 15 different NLP benchmarks) and manually. Results show that our proposed LaMini-LM are on par with competitive baselines while being nearly 10 times smaller in size.
翻译:大型语言模型(LLM)通过指令微调展现了卓越的生成能力。然而,这些模型资源消耗巨大。为缓解这一问题,我们探索从指令微调后的LLM中蒸馏知识到更小的模型。为此,我们在现有指令和新生成指令的基础上精心构建了一个包含258万条指令的大规模数据集。除规模庞大外,我们特意设计指令以覆盖广泛主题,确保其多样性。对指令数据的深入研究验证了其多样性,随后我们使用gpt-3.5-turbo生成这些指令的响应。接着,我们利用这些指令对一系列模型进行微调,这些模型被命名为LaMini-LM,涵盖多种参数规模,包括编码器-解码器架构和仅解码器架构。我们通过自动评估(在15个不同的NLP基准上)和人工评估两种方式评测模型。结果表明,我们的LaMini-LM与竞争基线模型性能相当,但模型规模几乎缩小了10倍。