In this paper, we propose Singular Values and Orthonormal Regularized Singular Vectors Adaptation, or SORSA, a novel PEFT method. Each SORSA adapter consists of two main parts: trainable principal singular weights $W_p = U_p \text{diag}(S_p) V^\top_p$, and frozen residual weights $W_r = U_r \text{diag}(S_r) V^\top_r$. These parts are initialized by performing singular value decomposition (SVD) on pre-trained weights. Moreover, we implement and analyze an orthonormal regularizer, which we prove could decrease the condition number of $W_p$ and make the optimization more efficient. SORSA adapters could be merged during inference, thus eliminating any inference latency. We also introduce a method to analyze the variation of the parameters by performing SVD and discuss and analyze SORSA's superiority in minimizing the alteration in the SVD aspect. After all, SORSA shows a faster convergence than LoRA and PiSSA in our experiments. On the GSM-8K benchmark, Llama 2 7B adapted using SORSA achieved 56.03% accuracy, surpassing LoRA (42.30%), AdaLoRA (47.30%), Full FT (49.05%), and PiSSA (53.07%). On the MATH benchmark, SORSA achieved 10.36% accuracy, outperforming LoRA (5.50%), AdaLoRA (6.48%), Full FT (7.22%), and PiSSA (7.44%). We conclude that SORSA offers a new perspective on parameter-efficient fine-tuning, demonstrating remarkable performance.
翻译:本文提出了一种新颖的参数高效微调方法——奇异值与正交正则化奇异向量适配,简称SORSA。每个SORSA适配器主要由两部分构成:可训练的主奇异权重 $W_p = U_p \text{diag}(S_p) V^\top_p$,以及冻结的残差权重 $W_r = U_r \text{diag}(S_r) V^\top_r$。这两部分通过对预训练权重进行奇异值分解来初始化。此外,我们实现并分析了一种正交正则化器,我们证明其能够降低 $W_p$ 的条件数,从而使优化过程更高效。SORSA适配器可在推理阶段合并,从而消除任何推理延迟。我们还引入了一种通过奇异值分解来分析参数变化的方法,并讨论和分析了SORSA在最小化奇异值分解层面上的参数改变方面的优势。最终,在我们的实验中,SORSA表现出比LoRA和PiSSA更快的收敛速度。在GSM-8K基准测试中,使用SORSA适配的Llama 2 7B模型达到了56.03%的准确率,超越了LoRA(42.30%)、AdaLoRA(47.30%)、全参数微调(49.05%)和PiSSA(53.07%)。在MATH基准测试中,SORSA达到了10.36%的准确率,优于LoRA(5.50%)、AdaLoRA(6.48%)、全参数微调(7.22%)和PiSSA(7.44%)。我们得出结论,SORSA为参数高效微调提供了一个新的视角,并展现出卓越的性能。