The increasing sizes of large language models (LLMs) result in significant computational overhead and memory usage when adapting these models to specific tasks or domains. Various parameter-efficient fine-tuning (PEFT) methods have been devised to mitigate these challenges by training a small set of parameters for the task-specific updates of the model weights. Among PEFT methods, LoRA stands out for its simplicity and efficiency, inspiring the development of a series of variants. However, LoRA and its successors disregard the knowledge that is noisy or irrelevant to the targeted task, detrimentally impacting model performance and leading to suboptimality. To address this limitation, we introduce Knowledge-aware Singular-value Adaptation (KaSA), a PEFT method that leverages singular value decomposition (SVD) with knowledge-aware singular values to dynamically activate knowledge based on its relevance to the task at hand. We conduct extensive experiments across a range of LLMs on tasks spanning natural language understanding (NLU), generation (NLG), instruction following, and commonsense reasoning. The experimental results demonstrate that KaSA consistently outperforms FFT and 14 popular PEFT baselines across 16 benchmarks and 4 synthetic datasets, underscoring our method's efficacy and adaptability. The source code of our method is available at https://github.com/juyongjiang/KaSA.
翻译:大型语言模型(LLM)规模的持续增长导致其在适应特定任务或领域时产生显著的计算开销与内存占用。为应对这一挑战,研究者提出了多种参数高效微调(PEFT)方法,通过训练少量参数来实现模型权重的任务特定更新。在众多PEFT方法中,LoRA因其简洁性与高效性脱颖而出,并催生了一系列变体。然而,LoRA及其后续方法未能有效处理与目标任务无关或存在噪声的知识,这损害了模型性能并导致次优结果。为解决这一局限,本文提出知识感知奇异值自适应(KaSA)方法,这是一种利用奇异值分解(SVD)并结合知识感知奇异值的PEFT方法,能够根据知识与任务的相关性动态激活知识。我们在多种LLM上开展了广泛实验,覆盖自然语言理解(NLU)、自然语言生成(NLG)、指令跟随及常识推理等任务。实验结果表明,在16个基准测试集和4个合成数据集上,KaSA均全面优于全参数微调(FFT)及14种主流PEFT基线方法,充分验证了本方法的有效性与适应性。本方法的源代码公开于:https://github.com/juyongjiang/KaSA。