Low-Rank Adaptation (LoRA), as a representative Parameter-Efficient Fine-Tuning (PEFT)method, significantly enhances the training efficiency by updating only a small portion of the weights in Large Language Models (LLMs). Recently, weight-only quantization techniques have also been applied to LoRA methods to reduce the memory footprint of fine-tuning. However, applying weight-activation quantization to the LoRA pipeline is under-explored, and we observe substantial performance degradation primarily due to the presence of activation outliers. In this work, we propose RoLoRA, the first LoRA-based scheme for effective weight-activation quantization. RoLoRA utilizes rotation for outlier elimination and proposes rotation-aware fine-tuning to preserve the outlier-free characteristics in rotated LLMs. Experimental results show RoLoRA consistently improves low-bit LoRA convergence and post-training quantization robustness in weight-activation settings. We evaluate RoLoRA across LLaMA2-7B/13B, LLaMA3-8B models, achieving up to 29.5% absolute accuracy gain of 4-bit weight-activation quantized LLaMA2- 13B on commonsense reasoning tasks compared to LoRA baseline. We further demonstrate its effectiveness on Large Multimodal Models (LLaVA-1.5-7B). Codes are available at https://github.com/HuangOwen/RoLoRA
翻译:低秩适应(LoRA)作为一种代表性的参数高效微调(PEFT)方法,通过仅更新大语言模型(LLM)中的少量权重,显著提升了训练效率。近期,仅权重量化技术也被应用于LoRA方法以降低微调过程的内存占用。然而,将权重-激活量化应用于LoRA流程的研究尚不充分,我们观察到该方法存在显著的性能下降,其主要原因在于激活值中存在的异常值。本工作提出RoLoRA,首个基于LoRA框架的高效权重-激活量化方案。RoLoRA利用旋转变换消除异常值,并提出旋转感知微调方法以保持旋转后LLM的无异常值特性。实验结果表明,RoLoRA在权重-激活量化设定下,能持续提升低比特LoRA的收敛性能与训练后量化鲁棒性。我们在LLaMA2-7B/13B及LLaMA3-8B模型上评估RoLoRA,相较于LoRA基线方法,在常识推理任务上使4比特权重-激活量化的LLaMA2-13B模型获得最高29.5%的绝对准确率提升。我们进一步在大规模多模态模型(LLaVA-1.5-7B)上验证了其有效性。代码已开源:https://github.com/HuangOwen/RoLoRA