Low-rank adapation (LoRA) is a popular method that reduces the number of trainable parameters when finetuning large language models, but still faces acute storage challenges when scaling to even larger models or deploying numerous per-user or per-task adapted models. In this work, we present Vector-based Random Matrix Adaptation (VeRA), which reduces the number of trainable parameters by 10x compared to LoRA, yet maintains the same performance. It achieves this by using a single pair of low-rank matrices shared across all layers and learning small scaling vectors instead. We demonstrate its effectiveness on the GLUE and E2E benchmarks, and show its application in instruction-following with just 1.4M parameters using the Llama2 7B model.
翻译:低秩自适应(LoRA)是一种在微调大型语言模型时减少可训练参数数量的流行方法,但在扩展到更大模型或部署大量按用户或按任务适配的模型时,仍面临严峻的存储挑战。本研究提出基于向量的随机矩阵自适应(VeRA),该方法在保持与LoRA相同性能的同时,将可训练参数数量减少至LoRA的十分之一。其核心思想是在所有层共享单一对低秩矩阵,并仅学习小型缩放向量。我们在GLUE和E2E基准测试上验证了其有效性,并展示了在Llama2 7B模型上仅用140万参数即可实现指令跟随的应用实例。