Low-Rank Adaptation (LoRA) emerges as a popular parameter-efficient fine-tuning (PEFT) method, which proposes to freeze pretrained model weights and update an additive low-rank trainable matrix. In this work, we study the enhancement of LoRA training by introducing an $r \times r$ preconditioner in each gradient step where $r$ is the LoRA rank. We theoretically verify that the proposed preconditioner stabilizes feature learning with LoRA under infinite-width NN setting. Empirically, the implementation of this new preconditioner requires a small change to existing optimizer code and creates virtually minuscule storage and runtime overhead. Our experimental results with both large language models and text-to-image diffusion models show that with this new preconditioner, the convergence and reliability of SGD and AdamW can be significantly enhanced. Moreover, the training process becomes much more robust to hyperparameter choices such as learning rate. The new preconditioner can be derived from a novel Riemannian metric in low-rank matrix field. Code can be accessed at https://github.com/pilancilab/Riemannian_Preconditioned_LoRA.
翻译:低秩适应(Low-Rank Adaptation, LoRA)作为一种流行的参数高效微调(PEFT)方法,通过冻结预训练模型权重并添加可训练的低秩矩阵实现参数更新。本研究提出在每次梯度更新中引入一个$r \times r$预条件子(其中$r$为LoRA秩)以增强LoRA训练性能。我们从理论上证明,在无限宽神经网络设定下,所提预条件子能稳定LoRA特征学习过程。实验实现方面,该预条件子仅需对现有优化器代码进行微小修改,且存储与运行时开销近乎忽略不计。针对大型语言模型和文本到图像扩散模型的实验结果表明,该预条件子能显著提升SGD和AdamW优化器的收敛速度与可靠性,同时使训练过程对学习率等超参数的选择具有更强的鲁棒性。该预条件子可基于低秩矩阵场中的新型黎曼度量推导得出。代码详见https://github.com/pilancilab/Riemannian_Preconditioned_LoRA。