Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method that leverages low-rank adaptation of weight matrices, has emerged as a prevalent technique for fine-tuning pre-trained models such as large language models and diffusion models. Despite its huge success in practice, the theoretical underpinnings of LoRA have largely remained unexplored. This paper takes the first step to bridge this gap by theoretically analyzing the expressive power of LoRA. We prove that, for fully connected neural networks, LoRA can adapt any model $f$ to accurately represent any smaller target model $\overline{f}$ if LoRA-rank $\geq(\text{width of }f) \times \frac{\text{depth of }\overline{f}}{\text{depth of }f}$. We also quantify the approximation error when LoRA-rank is lower than the threshold. For Transformer networks, we show any model can be adapted to a target model of the same size with rank-$(\frac{\text{embedding size}}{2})$ LoRA adapters.
翻译:低秩适应(Low-Rank Adaptation, LoRA)作为一种参数高效的微调方法,通过利用权重矩阵的低秩自适应,已成为微调大规模语言模型和扩散模型等预训练模型的流行技术。尽管LoRA在实践中取得了巨大成功,但其理论基础在很大程度上仍未得到探索。本文迈出了弥合这一差距的第一步,从理论上分析了LoRA的表达能力。我们证明,对于全连接神经网络,如果LoRA秩≥(f的宽度)×(f的目标模型深度/f的深度),则LoRA可以自适应任何模型f以精确表示任意较小的目标模型f。我们还量化了当LoRA秩低于该阈值时的近似误差。对于Transformer网络,我们表明可以使用秩为(嵌入大小/2)的LoRA适配器将任何模型自适应为相同大小的目标模型。