Among the widely used parameter-efficient fine-tuning (PEFT) methods, LoRA and its variants have gained considerable popularity because of avoiding additional inference costs. However, there still often exists an accuracy gap between these methods and full fine-tuning (FT). In this work, we first introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA. Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed Low-Rank Adaptation (DoRA). DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning, specifically employing LoRA for directional updates to efficiently minimize the number of trainable parameters. By employing \ours, we enhance both the learning capacity and training stability of LoRA while avoiding any additional inference overhead. \ours~consistently outperforms LoRA on fine-tuning LLaMA, LLaVA, and VL-BART on various downstream tasks, such as commonsense reasoning, visual instruction tuning, and image/video-text understanding. Code is available at https://github.com/NVlabs/DoRA.
翻译:在广泛使用的参数高效微调(PEFT)方法中,LoRA及其变体因避免了额外的推理开销而获得了相当大的普及。然而,这些方法与全参数微调(FT)之间仍然经常存在精度差距。在这项工作中,我们首先引入了一种新颖的权重分解分析,以研究FT与LoRA之间的固有差异。基于研究结果,为了模拟FT的学习能力,我们提出了权重分解的低秩自适应(DoRA)。DoRA将预训练权重分解为幅度和方向两个组件进行微调,具体而言,利用LoRA进行方向更新,以高效地最小化可训练参数的数量。通过采用DoRA,我们在避免任何额外推理开销的同时,增强了LoRA的学习能力和训练稳定性。在多种下游任务(如常识推理、视觉指令微调以及图像/视频-文本理解)上对LLaMA、LLaVA和VL-BART进行微调时,DoRA始终优于LoRA。代码可在 https://github.com/NVlabs/DoRA 获取。