Large Language Models have become the core architecture upon which most modern natural language processing (NLP) systems build. These models can consistently deliver impressive accuracy and robustness across tasks and domains, but their high computational overhead can make inference difficult and expensive. To make using these models less costly, recent work has explored leveraging structured and unstructured pruning, quantization, and distillation to improve inference speed and decrease size. This paper studies how models pruned using Gradual Unstructured Magnitude Pruning can transfer between domains and tasks. Our experimentation shows that models that are pruned during pretraining using general domain masked language models can transfer to novel domains and tasks without extensive hyperparameter exploration or specialized approaches. We demonstrate that our general sparse model Sparse*BERT can become SparseBioBERT simply by pretraining the compressed architecture on unstructured biomedical text. Moreover, we show that SparseBioBERT can match the quality of BioBERT with only 10\% of the parameters.
翻译:大型语言模型已成为现代自然语言处理系统的核心架构。这些模型在跨任务和跨领域场景中持续展现出卓越的准确性与鲁棒性,但其高昂的计算开销导致推理过程困难且成本高昂。为降低使用成本,近期研究探索了利用结构化与非结构化剪枝、量化及蒸馏技术来提升推理速度、缩小模型规模。本文研究了采用渐进非结构化幅度剪枝方法压缩的模型如何实现跨领域与跨任务迁移。实验表明,基于通用领域掩码语言模型在预训练阶段进行剪枝的模型,无需大量超参数探索或专用方法即可迁移至全新领域与任务。我们证实,仅需对压缩架构进行非结构化生物医学文本的预训练,通用稀疏模型Sparse*BERT即可转化为SparseBioBERT。此外,研究显示SparseBioBERT在仅使用BioBERT 10%参数量的情况下,即可达到与之相当的质量水平。