Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models. In this paper, we analyze the impact of low-rank updating, as implemented in LoRA. Our findings suggest that the low-rank updating mechanism may limit the ability of LLMs to effectively learn and memorize new knowledge. Inspired by this observation, we propose a new method called MoRA, which employs a square matrix to achieve high-rank updating while maintaining the same number of trainable parameters. To achieve it, we introduce the corresponding non-parameter operators to reduce the input dimension and increase the output dimension for the square matrix. Furthermore, these operators ensure that the weight can be merged back into LLMs, which makes our method can be deployed like LoRA. We perform a comprehensive evaluation of our method across five tasks: instruction tuning, mathematical reasoning, continual pretraining, memory and pretraining. Our method outperforms LoRA on memory-intensive tasks and achieves comparable performance on other tasks.
翻译:低秩适应是大语言模型中一种流行的参数高效微调方法。本文分析了LoRA所实现的低秩更新机制的影响。研究结果表明,低秩更新机制可能限制大语言模型有效学习与记忆新知识的能力。基于这一发现,我们提出了一种名为MoRA的新方法,该方法采用方阵实现高秩更新,同时保持可训练参数数量不变。为实现该目标,我们引入了相应的非参数算子,以降低方阵的输入维度并增加其输出维度。此外,这些算子确保权重可合并回大语言模型,使得本方法能够像LoRA一样部署。我们在指令微调、数学推理、持续预训练、记忆与预训练五项任务上进行了全面评估。实验结果表明,本方法在记忆密集型任务上优于LoRA,并在其他任务上取得了与LoRA相当的性能。