LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

Large language models (LLMs) with instruction finetuning demonstrate superior generative capabilities. However, these models are resource intensive. To alleviate this issue, we explore distilling knowledge from instruction-tuned LLMs to much smaller ones. To this end, we carefully develop a large set of 2.58M instructions based on both existing and newly-generated instructions. In addition to being sizeable, we design our instructions to cover a broad set of topics to ensure. A thorough investigation of our instruction data demonstrate their diversity, and we generate responses for these instructions using gpt-3.5-turbo. We then exploit the instructions to tune a host of models, dubbed LaMini-LM, of varying sizes, both from the encoder-decoder as well as the decoder-only families. We evaluate our models both automatically (on 15 different NLP benchmarks) and manually. Results show that our proposed LaMini-LM are on par with competitive baselines while being nearly 10 times smaller in size.

翻译：大型语言模型（LLM）通过指令微调展现了卓越的生成能力。然而，这些模型资源消耗巨大。为缓解这一问题，我们探索从指令微调后的LLM中蒸馏知识到更小的模型。为此，我们在现有指令和新生成指令的基础上精心构建了一个包含258万条指令的大规模数据集。除规模庞大外，我们特意设计指令以覆盖广泛主题，确保其多样性。对指令数据的深入研究验证了其多样性，随后我们使用gpt-3.5-turbo生成这些指令的响应。接着，我们利用这些指令对一系列模型进行微调，这些模型被命名为LaMini-LM，涵盖多种参数规模，包括编码器-解码器架构和仅解码器架构。我们通过自动评估（在15个不同的NLP基准上）和人工评估两种方式评测模型。结果表明，我们的LaMini-LM与竞争基线模型性能相当，但模型规模几乎缩小了10倍。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

47+阅读 · 2020年10月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日