Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Deploying large language models (LLMs) is challenging because they are memory inefficient and compute-intensive for practical applications. In reaction, researchers train smaller task-specific models by either finetuning with human labels or distilling using LLM-generated labels. However, finetuning and distillation require large amounts of training data to achieve comparable performance to LLMs. We introduce Distilling step-by-step, a new mechanism that (a) trains smaller models that outperform LLMs, and (b) achieves so by leveraging less training data needed by finetuning or distillation. Our method extracts LLM rationales as additional supervision for small models within a multi-task training framework. We present three findings across 4 NLP benchmarks: First, compared to both finetuning and distillation, our mechanism achieves better performance with much fewer labeled/unlabeled training examples. Second, compared to LLMs, we achieve better performance using substantially smaller model sizes. Third, we reduce both the model size and the amount of data required to outperform LLMs; our 770M T5 model outperforms the 540B PaLM model using only 80% of available data on a benchmark task.

翻译：部署大型语言模型（LLM）面临挑战，因其在实际应用中存在内存效率低和计算密集的问题。为此，研究人员通过两种途径训练较小的任务专用模型：使用人工标注进行微调，或利用LLM生成的标签进行蒸馏。然而，微调和蒸馏都需要大量训练数据才能达到与LLM相当的性能。我们提出“逐步提炼”（Distilling step-by-step）这一新机制，它能够（a）训练出超越LLM性能的小模型，且（b）通过利用比微调或蒸馏所需更少的训练数据实现上述目标。该方法在多任务训练框架中提取LLM的推理过程作为小模型的额外监督。我们在4个自然语言处理基准测试中发现三个关键结论：第一，与微调和蒸馏相比，该机制能用更少的标注/未标注训练样本取得更优性能；第二，与LLM相比，我们用显著更小的模型规模实现了更优性能；第三，我们同时减少了模型规模和所需数据量，成功超越LLM——在某一基准任务中，仅使用80%可用数据，770M参数的T5模型便超越了540B参数的PaLM模型。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/