Low-Rank Adaptation (LoRA) is a popular method for parameter-efficient fine-tuning (PEFT) of generative models, valued for its simplicity and effectiveness. Despite recent enhancements, LoRA still suffers from a fundamental limitation: overfitting when the bottleneck is widened. It performs best at ranks 32-64, yet its accuracy stagnates or declines at higher ranks, still falling short of full fine-tuning (FFT) performance. We identify the root cause as LoRA's structural bottleneck, which introduces gradient entanglement to the unrelated input channels and distorts gradient propagation. To address this, we introduce a novel structure, Granular Low-Rank Adaptation (GraLoRA) that partitions weight matrices into sub-blocks, each with its own low-rank adapter. With negligible computational or storage cost, GraLoRA overcomes LoRA's limitations, effectively increases the representational capacity, and more closely approximates FFT behavior. Experiments on code generation and commonsense reasoning benchmarks show that GraLoRA consistently outperforms LoRA and other baselines, achieving up to +8.5% absolute gain in Pass@1 on HumanEval+. These improvements hold across model sizes and rank settings, making GraLoRA a scalable and robust solution for PEFT. Code, data, and scripts are available at https://github.com/SqueezeBits/GraLoRA.git
翻译:低秩自适应(LoRA)是一种流行的生成模型参数高效微调(PEFT)方法,因其简单有效而备受重视。尽管近期已有改进,LoRA仍存在一个根本性局限:当瓶颈拓宽时会出现过拟合。其在秩为32-64时表现最佳,但在更高秩时准确率停滞或下降,仍无法达到全参数微调(FFT)的性能。我们确定其根本原因在于LoRA的结构性瓶颈,该瓶颈会向无关的输入通道引入梯度纠缠并扭曲梯度传播。为解决此问题,我们提出了一种新颖的结构——细粒度低秩自适应(GraLoRA),它将权重矩阵划分为子块,每个子块拥有独立的低秩适配器。在计算和存储成本可忽略不计的前提下,GraLoRA克服了LoRA的局限,有效提升了表示能力,并更接近FFT的行为。在代码生成和常识推理基准测试上的实验表明,GraLoRA始终优于LoRA及其他基线方法,在HumanEval+上实现了高达+8.5%的Pass@1绝对增益。这些改进在不同模型规模和秩设置下均成立,使得GraLoRA成为一种可扩展且鲁棒的PEFT解决方案。代码、数据和脚本可在 https://github.com/SqueezeBits/GraLoRA.git 获取。