In this paper, we introduce the range of oBERTa language models, an easy-to-use set of language models which allows Natural Language Processing (NLP) practitioners to obtain between 3.8 and 24.3 times faster models without expertise in model compression. Specifically, oBERTa extends existing work on pruning, knowledge distillation, and quantization and leverages frozen embeddings improves distillation and model initialization to deliver higher accuracy on a broad range of transfer tasks. In generating oBERTa, we explore how the highly optimized RoBERTa differs from the BERT for pruning during pre-training and finetuning. We find it less amenable to compression during fine-tuning. We explore the use of oBERTa on seven representative NLP tasks and find that the improved compression techniques allow a pruned oBERTa model to match the performance of BERTbase and exceed the performance of Prune OFA Large on the SQUAD V1.1 Question Answering dataset, despite being 8x and 2x, respectively faster in inference. We release our code, training regimes, and associated model for broad usage to encourage usage and experimentation
翻译:本文提出oBERTa语言模型系列,这是一套易于使用的语言模型集合,使自然语言处理从业者无需具备模型压缩专业知识即可获得3.8至24.3倍加速的模型。具体而言,oBERTa扩展了现有的剪枝、知识蒸馏和量化研究,通过利用冻结嵌入改进蒸馏和模型初始化,在广泛的迁移任务上实现更高准确率。在生成oBERTa过程中,我们探究了高度优化的RoBERTa与BERT在预训练和微调阶段剪枝特性的差异,发现RoBERTa在微调期间对压缩的适应性较弱。我们在七个代表性自然语言处理任务上评估oBERTa,结果表明改进的压缩技术使剪枝后的oBERTa模型性能可与BERTbase相媲美,并在SQUAD V1.1问答数据集上超越Prune OFA Large,尽管其推理速度分别快8倍和2倍。我们开源了代码、训练策略及相关模型,以促进广泛使用和实验探索。