In this paper, we present Delta-LoRA, which is a novel parameter-efficient approach to fine-tune large language models (LLMs). In contrast to LoRA and other low-rank adaptation methods such as AdaLoRA, Delta-LoRA not only updates the low-rank matrices $\bA$ and $\bB$, but also propagate the learning to the pre-trained weights $\bW$ via updates utilizing the delta of the product of two low-rank matrices ($\bA^{(t+1)}\bB^{(t+1)} - \bA^{(t)}\bB^{(t)}$). Such a strategy effectively addresses the limitation that the incremental update of low-rank matrices is inadequate for learning representations capable for downstream tasks. Moreover, as the update of $\bW$ does not need to compute the gradients of $\bW$ and store their momentums, Delta-LoRA shares comparable memory requirements and computational costs with LoRA. Extensive experiments show that Delta-LoRA significantly outperforms existing low-rank adaptation methods. We further support these results with comprehensive analyses that underscore the effectiveness of Delta-LoRA.
翻译:本文提出Delta-LoRA,一种用于微调大型语言模型(LLM)的新型参数高效方法。与LoRA及其他低秩自适应方法(如AdaLoRA)不同,Delta-LoRA不仅更新低秩矩阵$\bA$与$\bB$,还通过利用两个低秩矩阵乘积的差值($\bA^{(t+1)}\bB^{(t+1)} - \bA^{(t)}\bB^{(t)}$)的更新将学习传播至预训练权重$\bW$。该策略有效解决了低秩矩阵增量更新不足以学习下游任务所需表征的局限性。此外,由于$\bW$的更新无需计算其梯度或存储动量项,Delta-LoRA在内存需求与计算成本上与LoRA相当。大量实验表明,Delta-LoRA显著优于现有低秩自适应方法。我们进一步通过综合分析佐证了Delta-LoRA的有效性。