With the proliferation of large pre-trained language models (PLMs), fine-tuning all model parameters becomes increasingly inefficient, particularly when dealing with numerous downstream tasks that entail substantial training and storage costs. Several approaches aimed at achieving parameter-efficient fine-tuning (PEFT) have been proposed. Among them, Low-Rank Adaptation (LoRA) stands out as an archetypal method, incorporating trainable rank decomposition matrices into each target module. Nevertheless, LoRA does not consider the varying importance of each layer. To address these challenges, we introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process, considering both the temporary magnitude of weights and the accumulated statistics of the input to any given layer. We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.
翻译:随着大型预训练语言模型的普及,微调所有模型参数变得愈发低效,尤其是在处理需要大量训练和存储成本的众多下游任务时。已有研究提出了多种实现参数高效微调的方法。其中,低秩适配(LoRA)作为一种典型方法,将可训练的秩分解矩阵注入每个目标模块。然而,LoRA并未考虑各层重要性的差异。为解决这些问题,我们提出PRILoRA,该方法以递增方式为每一层线性分配不同的秩,并在训练过程中结合权重的即时幅度与任意层的输入累积统计量进行剪枝。通过在八个GLUE基准测试上的广泛实验,我们验证了PRILoRA的有效性,并创下了新的最佳性能记录。