Low-rank adaption (LoRA) is a prominent method that adds a small number of learnable parameters to the frozen pre-trained weights for parameter-efficient fine-tuning. Prompted by the question, ``Can we make its representation enough with LoRA weights solely at the final phase of finetuning without the pre-trained weights?'' In this work, we introduce Progressive Compression LoRA~(PC-LoRA), which utilizes low-rank adaptation (LoRA) to simultaneously perform model compression and fine-tuning. The PC-LoRA method gradually removes the pre-trained weights during the training process, eventually leaving only the low-rank adapters in the end. Thus, these low-rank adapters replace the whole pre-trained weights, achieving the goals of compression and fine-tuning at the same time. Empirical analysis across various models demonstrates that PC-LoRA achieves parameter and FLOPs compression rates of 94.36%/89.1% for vision models, e.g., ViT-B, and 93.42%/84.2% parameters and FLOPs compressions for language models, e.g., BERT.
翻译:低秩自适应(LoRA)是一种通过在冻结的预训练权重上添加少量可学习参数来实现参数高效微调的重要方法。受“在微调的最后阶段,能否仅使用LoRA权重而无需预训练权重,就使其表示足够充分?”这一问题的启发,本研究提出了渐进式压缩LoRA(PC-LoRA),该方法利用低秩自适应(LoRA)同时执行模型压缩与微调。PC-LoRA方法在训练过程中逐步移除预训练权重,最终仅保留低秩适配器。因此,这些低秩适配器完全取代了整个预训练权重,同时实现了压缩与微调的目标。在不同模型上的实证分析表明,PC-LoRA在视觉模型(如ViT-B)上实现了94.36%的参数压缩率和89.1%的FLOPs压缩率,在语言模型(如BERT)上实现了93.42%的参数压缩率和84.2%的FLOPs压缩率。