Low-Rank Adaptation (LoRA), as a representative Parameter-Efficient Fine-Tuning (PEFT)method, significantly enhances the training efficiency by updating only a small portion of the weights in Large Language Models (LLMs). Recently, weight-only quantization techniques have also been applied to LoRA methods to reduce the memory footprint of fine-tuning. However, applying weight-activation quantization to the LoRA pipeline is under-explored, and we observe substantial performance degradation primarily due to the presence of activation outliers. In this work, we propose RoLoRA, the first LoRA-based scheme for effective weight-activation quantization. RoLoRA utilizes rotation for outlier elimination and proposes rotation-aware fine-tuning to preserve the outlier-free characteristics in rotated LLMs. Experimental results show RoLoRA consistently improves low-bit LoRA convergence and post-training quantization robustness in weight-activation settings. We evaluate RoLoRA across LLaMA2-7B/13B, LLaMA3-8B models, achieving up to 29.5% absolute accuracy gain of 4-bit weight-activation quantized LLaMA2- 13B on commonsense reasoning tasks compared to LoRA baseline. We further demonstrate its effectiveness on Large Multimodal Models (LLaVA-1.5-7B). Codes are available at https://github.com/HuangOwen/RoLoRA
翻译:低秩适应(LoRA)作为一种代表性的参数高效微调方法,通过仅更新大语言模型中一小部分权重,显著提升了训练效率。最近,仅权重量化技术也被应用于LoRA方法以降低微调过程的内存占用。然而,将权重-激活量化应用于LoRA流程的研究尚不充分,我们观察到性能显著下降主要源于激活异常值的存在。本工作提出RoLoRA,首个基于LoRA的高效权重-激活量化方案。RoLoRA利用旋转消除异常值,并提出旋转感知微调以保持旋转后大语言模型的无异常值特性。实验结果表明,在权重-激活量化设置中,RoLoRA持续提升了低位LoRA的收敛性及训练后量化的鲁棒性。我们在LLaMA2-7B/13B和LLaMA3-8B模型上评估RoLoRA,相较于LoRA基线,在常识推理任务上使4位权重-激活量化的LLaMA2-13B获得了最高29.5%的绝对准确率提升。我们进一步在大型多模态模型(LLaVA-1.5-7B)上验证了其有效性。代码发布于https://github.com/HuangOwen/RoLoRA。