Low-rank adaptation (LoRA) has become a standard approach for fine-tuning large foundation models. However, our theoretical understanding of LoRA remains limited as prior analyses of LoRA's training dynamics either rely on linearization arguments or consider highly simplified setups. In this work, we analyze the LoRA loss landscape without such restrictive assumptions. We define two regimes: a ``special regime'', which includes idealized setups where linearization arguments hold, and a ``generic regime'' representing more realistic setups where linearization arguments do not hold. In the generic regime, we show that LoRA training converges to a global minimizer with low rank and small magnitude, or a qualitatively distinct solution with high rank and large magnitude. Finally, we argue that the zero-initialization and weight decay in LoRA training induce an implicit bias toward the low-rank, small-magnitude region of the parameter space -- where global minima lie -- thus shedding light on why LoRA training usually succeeds in finding global minima.
翻译:低秩适应(LoRA)已成为微调大型基础模型的标准方法。然而,目前对LoRA的理论理解仍然有限,因为先前对LoRA训练动态的分析要么依赖于线性化论证,要么考虑高度简化的设定。在本工作中,我们在不依赖此类限制性假设的前提下分析了LoRA的损失景观。我们定义了两个机制:一是“特殊机制”,包含线性化论证成立的理想化设定;二是“通用机制”,代表线性化论证不成立的更现实设定。在通用机制中,我们证明LoRA训练会收敛至具有低秩与小范数的全局极小值,或者收敛至具有高秩与大范数的性质截然不同的解。最后,我们论证LoRA训练中的零初始化和权重衰减会诱导参数空间向低秩、小范数区域(即全局极小值所在区域)的隐式偏置,从而揭示了LoRA训练通常能成功找到全局极小值的原因。