The increasing size of language models raises great research interests in parameter-efficient fine-tuning (e.g. Adapter, LoRA and prompt tuning) that freezes the pre-trained model, and injects small-scale trainable parameters for multiple downstream tasks. To further enhance the efficiency of fine-tuning, we propose a framework that integrates LoRA and structured layer pruning. In addition, based on MIMIC-IV-Note, we create two deidentified medical report summarization datasets. Further, We validate the integrated framework on the proposed two datasets and two medical dialogue datasets. By tuning 0.6% parameters of the original model and pruning over 30% Transformer-layers, the framework can speed up 100% of the training phase and reduce 50% of GPU memory usage, while preserving over 92% generation qualities on free-text sequence-to-sequence tasks.
翻译:大型语言模型的规模不断增长,引发了人们对参数高效微调(如Adapter、LoRA和提示调优)的广泛研究兴趣,这类方法冻结预训练模型,仅注入少量可训练参数以适应多个下游任务。为了进一步提升微调效率,我们提出了一种集成LoRA与结构化层剪枝的框架。此外,基于MIMIC-IV-Note数据集,我们创建了两个去标识化的医学报告摘要数据集。进一步地,我们在所提出的两个数据集及两个医学对话数据集上验证了该集成框架。通过仅调整原始模型0.6%的参数并剪枝超过30%的Transformer层,该框架在自由文本序列到序列任务上能够加速训练阶段100%、减少GPU内存使用50%,同时保持超过92%的生成质量。