SECURA: Sigmoid-Enhanced CUR Decomposition with Uninterrupted Retention and Low-Rank Adaptation in Large Language Models

from arxiv, New work on PEFT for LLMs, introducing S-MagNorm and CABR-LoRA to enhance fine-tuning performance and knowledge retention. In v4, we renamed Sigmoid-based Magnitude Normalization to S-MagNorm for clarity and added a gradient comparison between SECURA and CABR-LoRA to highlight their contributions

With the rapid development of large language models (LLMs), fully fine-tuning (FT) these models is becoming increasingly infeasible due to high computational demands. Moreover, FT also increases the risk of catastrophic forgetting. As an alternative, Low-Rank Adaptation (LoRA) has been proposed. By fine-tuning only a small subset of parameters, LoRA achieves performance similar to FT while significantly reducing resource requirements. However, since LoRA inherits FT's design, the issue of catastrophic forgetting still remains. To address these limitations, we propose SECURA: Sigmoid-Enhanced CUR Decomposition LoRA, a novel PEFT variant designed to mitigate catastrophic forgetting while improving fine-tuning performance. Our method introduces a novel normalization technique, Sigmoid-based Magnitude Norm (S-MagNorm), which enhances parameter retention and fine-tuning efficiency. SECURA has been evaluated on a diverse range of tasks, including mathematical problem-solving (GSM8K), complex question-answering (CNNDM), translation (NewsDE), and complex multiple-choice reasoning (LogiQA). Experimental results demonstrate that it achieves an average fine-tuning improvement of 3.59% across four MCQ tasks and 2.51% across five QA tasks on Gemma2 2B, Qwen2 1.5B, Qwen2 7B, Llama3 8B, and Llama3.1 8B, outperforming DoRA. Additionally, SECURA demonstrates superior knowledge retention capabilities, achieving state-of-the-art performance in 16 continual learning tests and maintaining more than 70% accuracy on LLMs' basic knowledge compared to Experience Replay (ER), sequential learning (SEQ), EWC, I-LoRA, and CUR-LoRA.

翻译：随着大语言模型（LLMs）的快速发展，由于高昂的计算需求，对这些模型进行全参数微调（FT）正变得越来越不可行。此外，FT还会增加灾难性遗忘的风险。作为一种替代方案，低秩适配（LoRA）被提出。通过仅微调一小部分参数，LoRA能够达到与FT相近的性能，同时显著降低资源需求。然而，由于LoRA继承了FT的设计，灾难性遗忘问题依然存在。为了解决这些局限性，我们提出了SECURA：基于Sigmoid增强CUR分解的LoRA方法，这是一种新颖的参数高效微调（PEFT）变体，旨在减轻灾难性遗忘并提升微调性能。我们的方法引入了一种新颖的归一化技术——基于Sigmoid的幅度归一化（S-MagNorm），该技术增强了参数保留能力和微调效率。SECURA已在多种任务上进行了评估，包括数学问题求解（GSM8K）、复杂问答（CNNDM）、翻译（NewsDE）以及复杂多选推理（LogiQA）。实验结果表明，在Gemma2 2B、Qwen2 1.5B、Qwen2 7B、Llama3 8B和Llama3.1 8B模型上，SECURA在四项多选题任务上平均微调性能提升了3.59%，在五项问答任务上平均提升了2.51%，性能优于DoRA。此外，SECURA展现出卓越的知识保留能力，在16项持续学习测试中取得了最先进的性能，并且与经验回放（ER）、顺序学习（SEQ）、EWC、I-LoRA以及CUR-LoRA相比，在大语言模型基础知识上保持了超过70%的准确率。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日