The Outstanding performance and growing size of Large Language Models has led to increased attention in parameter efficient learning. The two predominant approaches are Adapters and Pruning. Adapters are to freeze the model and give it a new weight matrix on the side, which can significantly reduce the time and memory of training, but the cost is that the evaluation and testing will increase the time and memory consumption. Pruning is to cut off some weight and re-distribute the remaining weight, which sacrifices the complexity of training at the cost of extremely high memory and training time, making the cost of evaluation and testing relatively low. So efficiency of training and inference can't be obtained in the same time. In this work, we propose a task-oriented Pruning-Adapter method that achieve a high memory efficiency of training and memory, and speeds up training time and ensures no significant decrease in accuracy in GLUE tasks, achieving training and inference efficiency at the same time.
翻译:大型语言模型的卓越性能与不断增长的规模,使得参数高效学习受到越来越多的关注。两种主要方法是适配器和剪枝。适配器冻结模型并为其在侧边添加新的权重矩阵,可显著降低训练时间和内存消耗,但代价是评估与测试阶段会增加时间和内存开销。剪枝则通过移除部分权重并重新分配剩余权重,以牺牲训练复杂度为代价换取极高的内存和训练时间效率,从而使评估和测试成本相对较低。因此,训练与推理的效率无法同时获得。在本工作中,我们提出了一种面向任务的剪枝适配器方法,该方法在训练和内存方面实现了高内存效率,加速了训练时间,并确保在GLUE任务中精度无显著下降,从而同时实现了训练与推理的高效性。