Among the widely used parameter-efficient finetuning (PEFT) methods, LoRA and its variants have gained considerable popularity because of avoiding additional inference costs. However, there still often exists an accuracy gap between these methods and full fine-tuning (FT). In this work, we first introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA. Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed LowRank Adaptation (DoRA). DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning, specifically employing LoRA for directional updates to efficiently minimize the number of trainable parameters. By employing DoRA, we enhance both the learning capacity and training stability of LoRA while avoiding any additional inference overhead. DoRA consistently outperforms LoRA on fine-tuning LLaMA, LLaVA, and VL-BART on various downstream tasks, such as commonsense reasoning, visual instruction tuning, and image/video-text understanding.
翻译:在广泛使用的参数高效微调(PEFT)方法中,LoRA及其变体因避免额外的推理开销而备受青睐。然而,这些方法与全参数微调(FT)之间仍常存在精度差距。本研究首先引入一种新颖的权重分解分析,以探究FT与LoRA之间的本质差异。基于该发现,为复现FT的学习能力,我们提出权重分解低秩适配方法(DoRA)。DoRA将预训练权重分解为幅度和方向两个分量进行微调,具体采用LoRA对方向分量进行更新,以高效减少可训练参数数量。通过使用DoRA,我们在避免任何额外推理负担的同时,提升了LoRA的学习能力和训练稳定性。在LLaMA、LLaVA和VL-BART的微调实验中,DoRA在常识推理、视觉指令微调、图像/视频文本理解等多项下游任务上均稳定优于LoRA。