Among the widely used parameter-efficient finetuning (PEFT) methods, LoRA and its variants have gained considerable popularity because of avoiding additional inference costs. However, there still often exists an accuracy gap between these methods and full fine-tuning (FT). In this work, we first introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA. Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed LowRank Adaptation (DoRA). DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning, specifically employing LoRA for directional updates to efficiently minimize the number of trainable parameters. By employing DoRA, we enhance both the learning capacity and training stability of LoRA while avoiding any additional inference overhead. DoRA consistently outperforms LoRA on fine-tuning LLaMA, LLaVA, and VL-BART on various downstream tasks, such as commonsense reasoning, visual instruction tuning, and image/video-text understanding. Code available at https://github.com/NVlabs/DoRA.
翻译:在广泛使用的参数高效微调(PEFT)方法中,LoRA及其变体因避免了额外推理成本而广受欢迎。然而,这些方法与全参数微调(FT)之间仍常存在精度差距。本文首先引入一种新颖的权重分解分析,以探究FT与LoRA之间的固有差异。基于研究结果,为模拟FT的学习能力,我们提出权重分解的低秩适配(DoRA)。DoRA将预训练权重分解为幅度和方向两个分量进行微调,其中方向更新采用LoRA以高效减少可训练参数数量。通过使用DoRA,我们在不引入任何额外推理开销的前提下,提升了LoRA的学习能力与训练稳定性。在LLaMA、LLaVA和VL-BART模型的微调中,DoRA在常识推理、视觉指令调优及图像/视频文本理解等下游任务上均持续优于LoRA。代码开源于https://github.com/NVlabs/DoRA。