Multi-task language models show outstanding performance for various natural language understanding tasks with only a single model. However, these language models utilize an unnecessarily large number of model parameters, even when used only for a specific task. This paper proposes a novel training-free compression method for multi-task language models using a pruning method. Specifically, we use an attribution method to determine which neurons are essential for performing a specific task. We task-specifically prune unimportant neurons and leave only task-specific parameters. Furthermore, we extend our method to be applicable in low-resource and unsupervised settings. Since our compression method is training-free, it uses few computing resources and does not destroy the pre-trained knowledge of language models. Experimental results on the six widely-used datasets show that our proposed pruning method significantly outperforms baseline pruning methods. In addition, we demonstrate that our method preserves performance even in an unseen domain setting.
翻译:多任务语言模型能以单一模型在各种自然语言理解任务中展现卓越性能。然而,即便仅用于特定任务时,这类语言模型仍会使用大量不必要的模型参数。本文提出一种新颖的免训练的多任务语言模型压缩方法,该方法采用剪枝技术。具体而言,我们利用归因方法确定执行特定任务所必需的神经元,通过任务特定剪枝移除不重要的神经元,仅保留任务特有参数。此外,我们将该方法扩展至低资源与无监督场景。由于压缩过程无需训练,该方法仅需少量计算资源,且不会破坏语言模型的预训练知识。在六个广泛使用的数据集上的实验结果表明,我们提出的剪枝方法显著优于基线剪枝方法。同时,我们证明该方法即使在未见领域场景中仍能保持模型性能。