SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models

The rapid advancement in large language models (LLMs) comes with a significant increase in their parameter size, presenting challenges for adaptation and fine-tuning. Parameter-efficient fine-tuning (PEFT) methods are widely used to adapt LLMs for downstream tasks efficiently. In this paper, we propose Singular Values and Orthonormal Regularized Singular Vectors Adaptation, or SORSA, a novel PEFT method. We introduce a method to analyze the variation of the parameters by performing singular value decomposition (SVD) and discuss and analyze SORSA's superiority in minimizing the alteration in the SVD aspect. Each SORSA adapter consists of two main parts: trainable principal singular weights $W_p = U_p \Sigma_p V^\top_p$, and frozen residual weights $W_r = U_r \Sigma_r V^\top_r$. These parts are initialized by performing SVD on pre-trained weights. Moreover, we implement and analyze an orthonormal regularizer, which could effectively transfer the scaling information into $\Sigma_p$ and ultimately allows the training process to be more efficient. SORSA adapters could be merged during inference, thus eliminating any inference latency. After all, SORSA shows a faster convergence than PiSSA and LoRA in our experiments. On the MATH benchmark, Llama 2 7B adapted using SORSA achieved 10.36% accuracy, outperforming LoRA (5.50%), Full FT (7.22%), and PiSSA (7.44%). On the GSM-8K benchmark, SORSA achieved 56.03% accuracy, surpassing LoRA (42.30%), Full FT (49.05%), and PiSSA (53.07%). We conclude that SORSA offers a new perspective on parameter-efficient fine-tuning, demonstrating remarkable performance. The code is available at https://github.com/Gunale0926/SORSA.

翻译：大语言模型（LLM）的快速发展伴随着其参数规模的显著增长，这给模型的适配与微调带来了挑战。参数高效微调（PEFT）方法被广泛用于高效地将LLM适配至下游任务。本文提出了一种新颖的PEFT方法——奇异值与正交正则化奇异向量适配（SORSA）。我们引入了一种通过执行奇异值分解（SVD）来分析参数变化的方法，并讨论和分析了SORSA在最小化SVD层面参数改动方面的优越性。每个SORSA适配器主要由两部分构成：可训练的主奇异权重 $W_p = U_p \Sigma_p V^\top_p$，以及冻结的残差权重 $W_r = U_r \Sigma_r V^\top_r$。这些部分通过对预训练权重执行SVD来初始化。此外，我们实现并分析了一种正交正则化器，它能有效地将缩放信息转移到 $\Sigma_p$ 中，最终使训练过程更加高效。SORSA适配器可以在推理过程中合并，从而消除任何推理延迟。最终，在我们的实验中，SORSA显示出比PiSSA和LoRA更快的收敛速度。在MATH基准测试中，使用SORSA适配的Llama 2 7B模型达到了10.36%的准确率，优于LoRA（5.50%）、全量微调（7.22%）和PiSSA（7.44%）。在GSM-8K基准测试中，SORSA达到了56.03%的准确率，超过了LoRA（42.30%）、全量微调（49.05%）和PiSSA（53.07%）。我们得出结论，SORSA为参数高效微调提供了一个新的视角，并展现出卓越的性能。代码可在 https://github.com/Gunale0926/SORSA 获取。