With the widespread adoption of LLMs, LoRA has become a dominant method for PEFT, and its initialization methods have attracted increasing attention. However, existing methods have notable limitations: many methods do not incorporate target-domain data, while gradient-based methods exploit data only at a shallow level by relying on one-step gradient decomposition, which remains unsatisfactory due to the weak empirical performance of the one-step fine-tuning model that serves as their basis, as well as the fact that these methods either lack a rigorous theoretical foundation or depend heavily on restrictive isotropic assumptions. In this paper, we establish a theoretical framework for data-aware LoRA initialization based on asymptotic analysis. Starting from a general optimization objective that minimizes the expectation of the parameter discrepancy between the fine-tuned and target models, we derive an optimization problem with two components: a bias term, which is related to the parameter distance between the fine-tuned and target models, and is approximated using a Fisher-gradient formulation to preserve anisotropy; and a variance term, which accounts for the uncertainty introduced by sampling stochasticity through the Fisher information. By solving this problem, we obtain an optimal initialization strategy for LoRA. Building on this theoretical framework, we develop an efficient algorithm, LoRA-DA, which estimates the terms in the optimization problem from a small set of target domain samples and obtains the optimal LoRA initialization. Empirical results across multiple benchmarks demonstrate that LoRA-DA consistently improves final accuracy over existing initialization methods. Additional studies show faster, more stable convergence, robustness across ranks, and only a small initialization overhead for LoRA-DA. The source code will be released upon publication.
翻译:随着大语言模型(LLMs)的广泛采用,LoRA已成为参数高效微调(PEFT)的主流方法,其初始化策略日益受到关注。然而,现有方法存在显著局限:许多方法未融入目标域数据,而基于梯度的方法仅通过依赖单步梯度分解对数据进行浅层利用,这仍不尽如人意——原因在于作为其基础的单步微调模型实证性能较弱,且这些方法要么缺乏严格的理论依据,要么严重依赖受限的各向同性假设。本文基于渐近分析建立了数据感知LoRA初始化的理论框架。从最小化微调模型与目标模型参数差异期望的通用优化目标出发,我们推导出一个包含两个分量的优化问题:偏差项,与微调模型和目标模型的参数距离相关,并通过Fisher-梯度公式进行近似以保持各向异性;以及方差项,通过Fisher信息量化采样随机性引入的不确定性。通过求解该问题,我们获得了LoRA的最优初始化策略。基于此理论框架,我们开发了一种高效算法LoRA-DA,该算法从少量目标域样本中估计优化问题中的各项,并得到最优的LoRA初始化。多个基准测试的实证结果表明,LoRA-DA在最终准确率上持续优于现有初始化方法。进一步研究显示,LoRA-DA具有更快、更稳定的收敛性,在不同秩上表现鲁棒,且仅带来微小的初始化开销。源代码将在论文发表后公开。