KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models

The increasing sizes of large language models (LLMs) result in significant computational overhead and memory usage when adapting these models to specific tasks or domains. Various parameter-efficient fine-tuning (PEFT) methods have been devised to mitigate these challenges by training a small set of parameters for the task-specific updates of the model weights. Among PEFT methods, LoRA stands out for its simplicity and efficiency, inspiring the development of a series of variants. However, LoRA and its successors disregard the knowledge that is noisy or irrelevant to the targeted task, detrimentally impacting model performance and leading to suboptimality. To address this limitation, we introduce Knowledge-aware Singular-value Adaptation (KaSA), a PEFT method that leverages singular value decomposition (SVD) with knowledge-aware singular values to dynamically activate knowledge based on its relevance to the task at hand. We conduct extensive experiments across a range of LLMs on tasks spanning natural language understanding (NLU), generation (NLG), instruction following, and commonsense reasoning. The experimental results demonstrate that KaSA consistently outperforms FFT and 14 popular PEFT baselines across 16 benchmarks and 4 synthetic datasets, underscoring our method's efficacy and adaptability. The source code of our method is available at https://github.com/juyongjiang/KaSA.

翻译：大型语言模型（LLM）规模的持续增长导致其在适应特定任务或领域时产生显著的计算开销与内存占用。为应对这一挑战，研究者提出了多种参数高效微调（PEFT）方法，通过训练少量参数来实现模型权重的任务特定更新。在众多PEFT方法中，LoRA因其简洁性与高效性脱颖而出，并催生了一系列变体。然而，LoRA及其后续方法未能有效处理与目标任务无关或存在噪声的知识，这损害了模型性能并导致次优结果。为解决这一局限，本文提出知识感知奇异值自适应（KaSA）方法，这是一种利用奇异值分解（SVD）并结合知识感知奇异值的PEFT方法，能够根据知识与任务的相关性动态激活知识。我们在多种LLM上开展了广泛实验，覆盖自然语言理解（NLU）、自然语言生成（NLG）、指令跟随及常识推理等任务。实验结果表明，在16个基准测试集和4个合成数据集上，KaSA均全面优于全参数微调（FFT）及14种主流PEFT基线方法，充分验证了本方法的有效性与适应性。本方法的源代码公开于：https://github.com/juyongjiang/KaSA。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日