In Low-Rank Adaptation (LoRA), the scaling factor $α$ is often treated as a mere complement to the learning rate, yet its role in optimization remains poorly understood. In this paper, we reveal that the scaling factor $α$ and the learning rate function differently, with $α$ emerging as the dominant driver of effective optimization, delivering gains that cannot be replicated by learning rate scaling alone. Through the synergy of extensive empirical analysis and a theoretical Signal-Drift framework, we uncover three findings into LoRA's scaling mechanism: First, LoRA's spectral suppression smooths the optimization landscape, rendering standard hyperparameters overly conservative and creating an optimization gap. Second, when leveraging this smoothness to accelerate convergence, $α$ outperforms the learning rate by amplifying the task signal without increasing the drift ratio. Third, the optimal scaling factor follows a sublinear relationship with the rank, well characterized by a square-root law with an unexpectedly large coefficient, revealing the insufficient scaling of existing rank-tied heuristics. Based on these insights, we propose LoRA-$α$, a minimalist framework that restores $α$ to its principled regime, making LoRA compatible with standard small learning rates. Extensive evaluations across diverse tasks demonstrate that LoRA-$α$ consistently improves performance while streamlining hyperparameter search, unleashing the learning potential of LoRA.
翻译:在低秩适配(LoRA)中,缩放因子 $α$ 常被视为学习率的简单补充,但其在优化中的角色仍未被充分理解。本文揭示缩放因子 $α$ 与学习率功能不同,$α$ 是有效优化的主导驱动力,其带来的增益无法通过单纯调整学习率复现。通过大规模实证分析与理论框架"信号-漂移"的协同作用,我们发现了LoRA缩放机制的三项结论:首先,LoRA的谱抑制平滑了优化景观,使得标准超参数过于保守,从而产生优化缺口。其次,利用这种平滑性加速收敛时,$α$ 通过放大任务信号且不增加漂移比,优于学习率。第三,最优缩放因子与秩之间呈次线性关系,可用具有异常大系数的平方根定律准确刻画,揭示了现有秩相关启发式方法缩放不足的问题。基于这些见解,我们提出LoRA-$α$——一个极简框架,将 $α$ 恢复至其原则性设置区间,使LoRA兼容标准小学习率。跨多种任务的广泛评估表明,LoRA-$α$ 在简化超参数搜索的同时持续提升性能,释放了LoRA的学习潜力。