Low-Rank Adaptation (LoRA), a prominent technique within the framework of Parameter-Efficient Fine-Tuning (PEFT), efficiently reduces the computational burden associated with adapting Large Language Models (LLMs) to downstream tasks, thereby enabling resource-constrained fine-tuning. However, existing researches have shown that LoRA suffers from slow convergence. To address this limitation, we introduce Dimension-Sharding Adaptation (DiSHA), which expands the PEFT design space to even fewer trainable parameters and faster convergence. Within DiSHA's design space, we propose Block Affine Efficient Computation (Bone), a computationally efficient structure that delivers both high performance and efficiency. While certain DiSHA configurations may result in colinear updates to weight shards, we address this with Block Affine Transformation (Bat), a nonlinear variant of DiSHA. Bat introduces nonlinearity by combining trainable matrices with original weight shards in a nonlinear manner, inducing nonlinearity in matrix updates without introducing additional parameters. Empirical results show that Bone, under the DiSHA framework, consistently outperforms LoRA variants in both Natural Language Understanding and Natural Language Generation tasks, with significantly improved computational efficiency. Further analysis demonstrates that BAT enhances model capabilities by leveraging its nonlinear design.
翻译:低秩自适应(LoRA)作为参数高效微调(PEFT)框架中的一项重要技术,能有效降低大语言模型(LLMs)适应下游任务时的计算负担,从而实现在资源受限条件下的微调。然而,现有研究表明LoRA存在收敛速度慢的问题。为克服这一局限,本文提出维度分片自适应(DiSHA),该技术将PEFT的设计空间扩展至更少的可训练参数和更快的收敛速度。在DiSHA的设计空间内,我们提出了块仿射高效计算(Bone),这是一种兼具高性能与高效性的计算高效结构。尽管某些DiSHA配置可能导致权重分片的共线性更新,我们通过块仿射变换(Bat)——一种DiSHA的非线性变体——来解决此问题。Bat通过以非线性方式将可训练矩阵与原始权重分片相结合来引入非线性,从而在不引入额外参数的情况下实现矩阵更新的非线性化。实验结果表明,在DiSHA框架下,Bone在自然语言理解与自然语言生成任务中均持续优于LoRA变体,且计算效率显著提升。进一步分析表明,BAT通过其非线性设计有效增强了模型能力。