Large language models (LLMs) have demonstrated remarkable generalizability, such as understanding arbitrary entities and relations. Instruction tuning has proven effective for distilling LLMs into more cost-efficient models such as Alpaca and Vicuna. Yet such student models still trail the original LLMs by large margins in downstream applications. In this paper, we explore targeted distillation with mission-focused instruction tuning to train student models that can excel in a broad application class such as open information extraction. Using named entity recognition (NER) for case study, we show how ChatGPT can be distilled into much smaller UniversalNER models for open NER. For evaluation, we assemble the largest NER benchmark to date, comprising 43 datasets across 9 diverse domains such as biomedicine, programming, social media, law, finance. Without using any direct supervision, UniversalNER attains remarkable NER accuracy across tens of thousands of entity types, outperforming general instruction-tuned models such as Alpaca and Vicuna by over 30 absolute F1 points in average. With a tiny fraction of parameters, UniversalNER not only acquires ChatGPT's capability in recognizing arbitrary entity types, but also outperforms its NER accuracy by 7-9 absolute F1 points in average. Remarkably, UniversalNER even outperforms by a large margin state-of-the-art multi-task instruction-tuned systems such as InstructUIE, which uses supervised NER examples. We also conduct thorough ablation studies to assess the impact of various components in our distillation approach. We will release the distillation recipe, data, and UniversalNER models to facilitate future research on targeted distillation.
翻译:大语言模型(LLMs)展现出显著的泛化能力,例如理解任意实体与关系。指令微调已被证明能将LLMs蒸馏为更具成本效益的模型(如Alpaca和Vicuna),然而这类学生模型在下游应用中仍与原始LLMs存在较大差距。本文探索基于任务导向指令微调的定向蒸馏方法,旨在训练在开放信息抽取等广泛任务类别中表现优异的学生模型。以命名实体识别(NER)为案例,我们展示了如何将ChatGPT蒸馏为更小的UniversalNER模型,用于开放NER任务。为进行评估,我们构建了迄今为止最大的NER基准数据集,涵盖生物医学、编程、社交媒体、法律、金融等9个不同领域的43个数据集。在未使用任何直接监督的情况下,UniversalNER在数万种实体类型上取得了显著的NER准确率,平均绝对F1值比通用指令微调模型(如Alpaca和Vicuna)高出30%以上。尽管参数量极小,UniversalNER不仅继承了ChatGPT识别任意实体类型的能力,其NER平均绝对F1值更比ChatGPT高出7-9个百分点。值得注意的是,UniversalNER甚至大幅超越使用监督NER示例的最先进多任务指令微调系统(如InstructUIE)。我们还通过详尽的消融实验评估了蒸馏方法各组成部分的影响。我们将公开蒸馏方案、数据及UniversalNER模型,以推动定向蒸馏领域的未来研究。