Low-rank adaptation (LoRA) and its variants provide a memory- and compute-efficient alternative to full fine-tuning of pre-trained models. However, questions remain about the comparative generalizability of these approaches and how the structural restrictions on low-rank updates preserve effective adaptation performance. We present a historical framing, covering the past (full fine-tuning and original LoRA), the present (different variants of LoRA), and propose simpler, cheaper, parameter-efficient extensions by inducing sparsity within existing LoRA variants: Cheap LoRA (cLA), training a single low-rank factor with the other fixed (deterministically or, in its randomized variant, stochastically), and the chained circulant variant, ${c}^3$LA. We frame cLA as a structured instance of asymmetric LoRA, serving as a controlled column-subspace restriction of full fine-tuning. We derive information-theoretic generalization error bounds for these variants, marking one of the first endeavors in this area. Empirically, we evaluate 11 fine-tuning methods across 10 pre-trained models and 14 datasets, analyzing the fine-tuned models' performance and generalization using tools such as loss landscapes and spectral analysis. Despite the sensitivity of fine-tuned models to the pre-trained model, datasets, and other factors, our study suggests that restricting LoRA-based PEFT methods' adaptation to a sparse, structured column space remains competitive across tasks with their parameter-matched baselines while reducing up to 10% training time and peak GPU memory up to 15%, even with a naïve, non-optimized, sparse implementation. Our theoretical and empirical generalization measures provide a more consistent and principled approach to their cost-effective adaptation than commonly used analytical tools. Overview and code are available at: https://elicaden.github.io/Beyond_LoRA/.
翻译:低秩适应(LoRA)及其变体为预训练模型的全参数微调提供了一种内存和计算高效的替代方案。然而,这些方法的泛化能力比较,以及低秩更新的结构限制如何保持有效的适应性能,仍是待解问题。我们提出了一种历史框架,涵盖过去(全参数微调和原始LoRA)、现在(LoRA的不同变体),并通过在现有LoRA变体中引入稀疏性,提出了更简单、更廉价且参数高效的扩展方法:廉价LoRA(cLA,训练单个低秩因子并固定另一个因子(确定性或随机变体中的随机性)),以及链式循环变体${c}^3$LA。我们将cLA视为非对称LoRA的结构化实例,作为全参数微调的可控列子空间约束。我们推导了这些变体的信息论泛化误差界限,这标志着在该领域的早期探索之一。通过实验,我们在10个预训练模型和14个数据集上评估了11种微调方法,利用损失景观和频谱分析等工具分析了微调模型的性能和泛化能力。尽管微调模型对预训练模型、数据集等因素敏感,但我们的研究表明,将基于LoRA的参数高效微调(PEFT)方法的适应限制于稀疏、结构化的列空间,在其参数匹配的基线中仍具有竞争力,同时最多可减少10%的训练时间和15%的峰值GPU内存(即使采用朴素、未优化的稀疏实现)。我们的理论和实证泛化度量提供了一种比常用分析工具更一致且更原则性的方法来实现成本效益适应。概述和代码见:https://elicaden.github.io/Beyond_LoRA/。