Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method that leverages low-rank adaptation of weight matrices, has emerged as a prevalent technique for fine-tuning pre-trained models such as large language models and diffusion models. Despite its huge success in practice, the theoretical underpinnings of LoRA have largely remained unexplored. This paper takes the first step to bridge this gap by theoretically analyzing the expressive power of LoRA. We prove that, for fully connected neural networks, LoRA can adapt any model $f$ to accurately represent any smaller target model $\overline{f}$ if LoRA-rank $\geq(\text{width of }f) \times \frac{\text{depth of }\overline{f}}{\text{depth of }f}$. We also quantify the approximation error when LoRA-rank is lower than the threshold. For Transformer networks, we show any model can be adapted to a target model of the same size with rank-$(\frac{\text{embedding size}}{2})$ LoRA adapters.
翻译:低秩适应(LoRA)是一种利用权重矩阵的低秩调整实现参数高效微调的方法,已成为微调大型语言模型和扩散模型等预训练模型的常用技术。尽管LoRA在实践中取得了巨大成功,但其理论基础仍鲜有探究。本文通过理论分析LoRA的表达能力,迈出了填补这一空白的第一步。我们证明,对于全连接神经网络,若LoRA秩 ≥ (输入模型$f$的宽度) × (目标模型$\overline{f}$的深度 / 输入模型$f$的深度),则LoRA可调整任何模型$f$以精确表示任意更小的目标模型$\overline{f}$。我们还量化了当LoRA秩低于该阈值时的近似误差。对于Transformer网络,我们表明任何模型均可通过秩为(嵌入维度/2)的LoRA适配器调整为相同规模的目标模型。