Low-rank adaptation (LoRA) has become a standard approach for fine-tuning large foundation models. However, our theoretical understanding of LoRA remains limited as prior analyses of LoRA's training dynamics either rely on linearization arguments or consider highly simplified setups. In this work, we analyze the LoRA loss landscape without such restrictive assumptions. We define two regimes: a ``special regime'', which includes idealized setups where linearization arguments hold, and a ``generic regime'' representing more realistic setups where linearization arguments do not hold. In the generic regime, we show that LoRA training converges to a global minimizer with low rank and small magnitude, or a qualitatively distinct solution with high rank and large magnitude. Finally, we argue that the zero-initialization and weight decay in LoRA training induce an implicit bias toward the low-rank, small-magnitude region of the parameter space -- where global minima lie -- thus shedding light on why LoRA training usually succeeds in finding global minima.
翻译:低秩适应(LoRA)已成为微调大型基础模型的标准方法。然而,由于先前对LoRA训练动态的分析要么依赖于线性化论证,要么考虑高度简化的设定,我们对LoRA的理论理解仍然有限。在本工作中,我们在不依赖此类限制性假设的条件下分析了LoRA的损失景观。我们定义了两个区域:一个“特殊区域”,其中包含线性化论证成立的理想化设定;以及一个“泛型区域”,代表线性化论证不成立的更现实设定。在泛型区域中,我们证明LoRA训练会收敛于一个低秩、小范数的全局极小值,或收敛于一个性质截然不同的高秩、大范数解。最后,我们论证了LoRA训练中的零初始化和权重衰减会诱导参数空间向低秩、小范数区域——即全局极小值所在区域——产生隐式偏置,从而阐明了LoRA训练通常能成功找到全局极小值的原因。