Low-rank adaptation (LoRA) has emerged as a widely adopted parameter-efficient fine-tuning method in deep transfer learning, due to its reduced number of trainable parameters and lower memory requirements enabled by Burer-Monteiro factorization on adaptation matrices. However, classical LoRA training methods treat the low-rank factor matrices individually and optimize them using standard gradient-based algorithms. Such decoupled optimization schemes are theoretically and empirically suboptimal, as they fail to fully exploit the intrinsic structure of the LoRA parameterization. In this work, we propose a novel continuous-time optimization dynamic for LoRA factor matrices in the form of an ordinary differential equation (ODE) that emulates the gradient flow of full fine-tuning on the balanced manifold. We term this approach ODELoRA. To faithfully track the trajectories of ODELoRA, we adopt well-established and theoretically grounded time-discretization schemes, including Euler and Runge--Kutta methods. Our framework provides a unified ODE-based perspective for understanding and designing LoRA training algorithms. We establish linear convergence of the proposed method under strongly convex objectives for certain discretization schemes under mild conditions, and further extend our analysis to the matrix sensing setting. Moreover, we show that ODELoRA achieves stable feature learning, a property that is crucial for training deep neural networks at different scales of problem dimensionality. Empirical results on matrix sensing tasks confirm the derived linear convergence behavior, and experiments on training physics-informed neural networks further demonstrate the superiority of ODELoRA over existing baselines, especially in the training stability.
翻译:低秩自适应(LoRA)因其通过Burer-Monteiro分解在适配矩阵上实现的可训练参数数量减少和内存需求降低,已成为深度迁移学习中广泛采用的参数高效微调方法。然而,经典的LoRA训练方法将低秩因子矩阵单独处理,并使用基于标准梯度的算法进行优化。这种解耦的优化方案在理论和实践上均非最优,因其未能充分利用LoRA参数化的内在结构。本文提出一种新颖的连续时间优化动态,以常微分方程(ODE)形式描述LoRA因子矩阵的演化,该动态模拟了在平衡流形上全参数微调的梯度流。我们将此方法命名为ODELoRA。为精确追踪ODELoRA的轨迹,我们采用成熟且理论完备的时间离散化方案,包括欧拉法和龙格-库塔法。该框架为理解和设计LoRA训练算法提供了统一的基于ODE的视角。我们在温和条件下针对特定离散化方案建立了强凸目标函数下的线性收敛性证明,并将分析进一步扩展到矩阵感知场景。此外,我们证明ODELoRA能够实现稳定的特征学习,这一特性对于在不同问题维度规模下训练深度神经网络至关重要。矩阵感知任务的实证结果验证了所推导的线性收敛行为,而在训练物理信息神经网络上的实验进一步证明了ODELoRA相对于现有基线的优越性,尤其在训练稳定性方面。