Fine-tuning Large Language Models (LLMs) with downstream data is often considered time-consuming and expensive. Structured pruning methods are primarily employed to improve the inference efficiency of pre-trained models. Meanwhile, they often require additional time and memory for training, knowledge distillation, structure search, and other strategies, making efficient model fine-tuning challenging to achieve. To simultaneously enhance the training and inference efficiency of downstream task fine-tuning, we introduce GradPruner, which can prune layers of LLMs guided by gradients in the early stages of fine-tuning. GradPruner uses the cumulative gradients of each parameter during the initial phase of fine-tuning to compute the Initial Gradient Information Accumulation Matrix (IGIA-Matrix) to assess the importance of layers and perform pruning. We sparsify the pruned layers based on the IGIA-Matrix and merge them with the remaining layers. Only elements with the same sign are merged to reduce interference from sign variations. We conducted extensive experiments on two LLMs across eight downstream datasets. Including medical, financial, and general benchmark tasks. The results demonstrate that GradPruner has achieved a parameter reduction of 40% with only a 0.99% decrease in accuracy. Our code is publicly available.
翻译:利用下游数据对大型语言模型(LLMs)进行微调通常被认为是耗时且昂贵的。结构化剪枝方法主要用于提升预训练模型的推理效率,但这些方法往往需要额外的时间和内存用于训练、知识蒸馏、结构搜索等策略,使得高效模型微调难以实现。为同时提升下游任务微调的训练与推理效率,我们提出了GradPruner,该方法能够在微调早期阶段基于梯度信息对LLMs的层进行剪枝。GradPruner利用微调初始阶段各参数的累积梯度计算初始梯度信息累积矩阵(IGIA-Matrix),以评估层的重要性并执行剪枝。我们基于IGIA-Matrix对剪除的层进行稀疏化处理,并将其与保留层进行融合。仅合并符号相同的元素以减少符号变化带来的干扰。我们在两种LLMs模型和八个下游数据集上进行了广泛实验,涵盖医疗、金融及通用基准任务。实验结果表明,GradPruner在精度仅下降0.99%的情况下实现了40%的参数削减。相关代码已公开。