Low-Rank Adaptation (LoRA) leverages the low intrinsic rank of weight updates in Large Language Models (LLMs), establishing a Parameter-Efficient Fine-Tuning (PEFT) paradigm. However, LoRA suffers from slow convergence. We introduce Dimension-Sharding Adaptation (DiSHA), which expands the PEFT design space to unlock lower intrinsic ranks and faster convergence by default. Within DiSHA's design space, we propose Block Affine Adaptation (Bone), a computationally efficient structure that delivers both high performance and efficiency. While certain DiSHA configurations may result in colinear updates to weight shards, we address this with Block Affine Transformation Adaptation (BAT), a nonlinear variant of DiSHA. BAT introduces nonlinearity by combining trainable matrices with original weight shards in a nonlinear manner, inducing nonlinearity in matrix updates without introducing additional parameters. Empirical results show that Bone, under the DiSHA framework, consistently outperforms LoRA variants in both NLG and NLU tasks, with significantly improved computational efficiency. Further analysis demonstrates that BAT enhances model capabilities by leveraging its nonlinear design.
翻译:低秩自适应(LoRA)利用大语言模型(LLM)中权重更新的低内在秩特性,建立了参数高效微调(PEFT)范式。然而,LoRA存在收敛速度慢的问题。本文提出维度分片自适应(DiSHA),通过扩展PEFT的设计空间,在默认情况下实现更低的内在秩与更快的收敛速度。在DiSHA的设计空间内,我们提出块仿射自适应(Bone)——一种计算高效的结构,能够同时提供高性能与高效率。尽管某些DiSHA配置可能导致权重分片的共线性更新,我们通过块仿射变换自适应(BAT)——一种DiSHA的非线性变体——来解决此问题。BAT通过以非线性方式将可训练矩阵与原始权重分片相结合来引入非线性,从而在不引入额外参数的情况下诱导矩阵更新的非线性。实验结果表明,在DiSHA框架下,Bone在自然语言生成(NLG)和自然语言理解(NLU)任务中均持续优于LoRA变体,且计算效率显著提升。进一步分析表明,BAT通过其非线性设计有效增强了模型能力。