Low-rank adaptation (LoRA) has become the standard approach for parameter-efficient fine-tuning of large language models (LLM), but our theoretical understanding of LoRA has been limited. In this work, we theoretically analyze LoRA fine-tuning in the neural tangent kernel (NTK) regime with $N$ data points, showing: (i) full fine-tuning (without LoRA) admits a low-rank solution of rank $r\lesssim \sqrt{N}$; (ii) using LoRA with rank $r\gtrsim \sqrt{N}$ eliminates spurious local minima, allowing gradient descent to find the low-rank solutions; (iii) the low-rank solution found using LoRA generalizes well.
翻译:低秩自适应(LoRA)已成为大型语言模型(LLM)参数高效微调的标准方法,但我们对其的理论理解一直有限。在本工作中,我们从理论上分析了在神经正切核(NTK)机制下,使用 $N$ 个数据点进行 LoRA 微调的过程,并证明:(i)完全微调(不使用 LoRA)允许一个秩为 $r\lesssim \sqrt{N}$ 的低秩解;(ii)使用秩 $r\gtrsim \sqrt{N}$ 的 LoRA 可以消除虚假局部最小值,使得梯度下降能够找到该低秩解;(iii)使用 LoRA 找到的低秩解具有良好的泛化能力。