With the rapid development of large language models (LLMs), fully fine-tuning (FT) these models is becoming increasingly infeasible due to high computational demands. Moreover, FT also increases the risk of catastrophic forgetting. As an alternative, Low-Rank Adaptation (LoRA) has been proposed. By fine-tuning only a small subset of parameters, LoRA achieves performance similar to FT while significantly reducing resource requirements. However, since LoRA inherits FT's design, the issue of catastrophic forgetting still remains. To address these limitations, we propose SECURA: Sigmoid-Enhanced CUR Decomposition LoRA, a novel PEFT variant designed to mitigate catastrophic forgetting while improving fine-tuning performance. Our method introduces a novel normalization technique, Sigmoid-based Magnitude Norm (S-MagNorm), which enhances parameter retention and fine-tuning efficiency. SECURA has been evaluated on a diverse range of tasks, including mathematical problem-solving (GSM8K), complex question-answering (CNNDM), translation (NewsDE), and complex multiple-choice reasoning (LogiQA). Experimental results demonstrate that it achieves an average fine-tuning improvement of 3.59% across four MCQ tasks and 2.51% across five QA tasks on Gemma2 2B, Qwen2 1.5B, Qwen2 7B, Llama3 8B, and Llama3.1 8B, outperforming DoRA. Additionally, SECURA demonstrates superior knowledge retention capabilities, achieving state-of-the-art performance in 16 continual learning tests and maintaining more than 70% accuracy on LLMs' basic knowledge compared to Experience Replay (ER), sequential learning (SEQ), EWC, I-LoRA, and CUR-LoRA.
翻译:随着大语言模型(LLMs)的快速发展,由于高昂的计算需求,对这些模型进行全参数微调(FT)正变得越来越不可行。此外,FT还会增加灾难性遗忘的风险。作为一种替代方案,低秩适配(LoRA)被提出。通过仅微调一小部分参数,LoRA能够达到与FT相近的性能,同时显著降低资源需求。然而,由于LoRA继承了FT的设计,灾难性遗忘问题依然存在。为了解决这些局限性,我们提出了SECURA:基于Sigmoid增强CUR分解的LoRA方法,这是一种新颖的参数高效微调(PEFT)变体,旨在减轻灾难性遗忘并提升微调性能。我们的方法引入了一种新颖的归一化技术——基于Sigmoid的幅度归一化(S-MagNorm),该技术增强了参数保留能力和微调效率。SECURA已在多种任务上进行了评估,包括数学问题求解(GSM8K)、复杂问答(CNNDM)、翻译(NewsDE)以及复杂多选推理(LogiQA)。实验结果表明,在Gemma2 2B、Qwen2 1.5B、Qwen2 7B、Llama3 8B和Llama3.1 8B模型上,SECURA在四项多选题任务上平均微调性能提升了3.59%,在五项问答任务上平均提升了2.51%,性能优于DoRA。此外,SECURA展现出卓越的知识保留能力,在16项持续学习测试中取得了最先进的性能,并且与经验回放(ER)、顺序学习(SEQ)、EWC、I-LoRA以及CUR-LoRA相比,在大语言模型基础知识上保持了超过70%的准确率。