Among the widely used parameter-efficient finetuning (PEFT) methods, LoRA and its variants have gained considerable popularity because of avoiding additional inference costs. However, there still often exists an accuracy gap between these methods and full fine-tuning (FT). In this work, we first introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA. Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed LowRank Adaptation (DoRA). DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning, specifically employing LoRA for directional updates to efficiently minimize the number of trainable parameters. By employing DoRA, we enhance both the learning capacity and training stability of LoRA while avoiding any additional inference overhead. DoRA consistently outperforms LoRA on fine-tuning LLaMA, LLaVA, and VL-BART on various downstream tasks, such as commonsense reasoning, visual instruction tuning, and image/video-text understanding.
翻译:在广泛使用的参数高效微调方法中,LoRA及其变体因避免额外推理开销而广受欢迎。然而,这些方法与全参数微调之间仍常存在精度差距。本研究首先引入一种新颖的权重分解分析,以探究全参数微调与LoRA之间的本质差异。基于发现结果,我们旨在模仿全参数微调的学习能力,提出权重分解低秩适配方法(DoRA)。DoRA将预训练权重分解为幅度和方向两个分量进行微调,特别采用LoRA进行方向更新以高效减少可训练参数数量。通过使用DoRA,我们在避免任何额外推理开销的同时,提升了LoRA的学习能力与训练稳定性。在微调LLaMA、LLaVA和VL-BART模型时,DoRA在常识推理、视觉指令微调、图像/视频文本理解等多种下游任务中始终优于LoRA。