Pre-trained language models, trained on large-scale corpora, demonstrate strong generalizability across various NLP tasks. Fine-tuning these models for specific tasks typically involves updating all parameters, which is resource-intensive. Parameter-efficient fine-tuning (PEFT) methods, such as the popular LoRA family, introduce low-rank matrices to learn only a few parameters efficiently. However, during inference, the product of these matrices updates all pre-trained parameters, complicating tasks like knowledge editing that require selective updates. We propose a novel PEFT method, which conducts \textbf{r}ow and c\textbf{o}lumn-wise spar\textbf{se} \textbf{lo}w-\textbf{r}ank \textbf{a}daptation (RoseLoRA), to address this challenge. RoseLoRA identifies and updates only the most important parameters for a specific task, maintaining efficiency while preserving other model knowledge. By adding a sparsity constraint on the product of low-rank matrices and converting it to row and column-wise sparsity, we ensure efficient and precise model updates. Our theoretical analysis guarantees the lower bound of the sparsity with respective to the matrix product. Extensive experiments on five benchmarks across twenty datasets demonstrate that RoseLoRA outperforms baselines in both general fine-tuning and knowledge editing tasks.
翻译:预训练语言模型通过大规模语料训练,在各种自然语言处理任务中展现出强大的泛化能力。针对特定任务微调这些模型通常需要更新全部参数,资源消耗较大。参数高效微调方法(如流行的LoRA系列)通过引入低秩矩阵仅学习少量参数,实现了高效微调。然而在推理阶段,这些矩阵的乘积会更新所有预训练参数,这对需要选择性更新的任务(如知识编辑)造成了困难。为此,我们提出一种新颖的参数高效微调方法——\textbf{行}与\textbf{列}方向\textbf{稀疏}的\textbf{低秩适}配(RoseLoRA),以应对这一挑战。RoseLoRA仅识别并更新特定任务中最关键的参数,在保持高效性的同时保留模型的其他知识。通过对低秩矩阵乘积施加稀疏约束并将其转化为行列方向的稀疏性,我们实现了高效且精确的模型更新。理论分析保证了矩阵乘积稀疏度的下界。在二十个数据集构成的五个基准测试上的大量实验表明,RoseLoRA在通用微调和知识编辑任务中均优于基线方法。