Low-rank adapation (LoRA) is a popular method that reduces the number of trainable parameters when finetuning large language models, but still faces acute storage challenges when scaling to even larger models or deploying numerous per-user or per-task adapted models. In this work, we present Vector-based Random Matrix Adaptation (VeRA), which significantly reduces the number of trainable parameters compared to LoRA, yet maintains the same performance. It achieves this by using a single pair of low-rank matrices shared across all layers and learning small scaling vectors instead. We demonstrate its effectiveness on the GLUE and E2E benchmarks, image classification tasks, and show its application in instruction-tuning of 7B and 13B language models.
翻译:低秩适配(LoRA)是一种流行的微调大语言模型时减少可训练参数数量的方法,但在扩展到更大模型或部署大量针对不同用户或任务的适配模型时,仍面临严峻的存储挑战。本文提出了基于向量的随机矩阵适配(VeRA),该方法与LoRA相比显著减少了可训练参数数量,同时保持了相同的性能。其核心是通过在所有层间共享单一对低秩矩阵,并仅学习少量缩放向量来实现这一目标。我们在GLUE和E2E基准测试、图像分类任务中验证了其有效性,并展示了其在7B和13B语言模型的指令微调中的应用。