Efficient finetuning of large language models (LLMs) aims to adapt the LLMs with reduced computation and memory cost. Previous LoRA-based approaches initialize the low-rank matrices with gaussian distribution and zero values, while keeping the original weight matrices frozen. However, the trainable model parameters optimized in an unguided subspace might have interference with the well-learned subspace of the pretrained weight matrix. In this paper, we propose MiLoRA, a simple yet effective LLM finetuning approach that only updates the minor singular components of the weight matrix while keeping the principle singular components frozen. It is observed that the minor matrix corresponds to the noisy or long-tail information, while the principle matrix contains important knowledge. The MiLoRA initializes the low-rank matrices within a subspace that is orthogonal to the principle matrix, thus the pretrained knowledge is expected to be well preserved. During finetuning, MiLoRA makes the most use of the less-optimized subspace for learning the finetuning dataset. Extensive experiments on commonsense reasoning, math reasoning and instruction following benchmarks present the superior performance of our method.
翻译:大型语言模型(LLM)的高效微调旨在以较低的计算和内存成本对模型进行适配。以往基于LoRA的方法采用高斯分布和零值初始化低秩矩阵,同时保持原始权重矩阵冻结。然而,在无引导子空间中优化的可训练模型参数可能会与预训练权重矩阵中已充分学习的子空间产生干扰。本文提出MiLoRA,这是一种简单而有效的LLM微调方法,仅更新权重矩阵的次要奇异分量,同时保持主要奇异分量冻结。研究发现,次要矩阵对应噪声或长尾信息,而主要矩阵包含重要知识。MiLoRA将低秩矩阵初始化在正交于主要矩阵的子空间中,从而有望完好保留预训练知识。在微调过程中,MiLoRA充分利用优化程度较低的子空间来学习微调数据集。在常识推理、数学推理和指令跟随基准测试上的大量实验表明,该方法具有优越性能。