As the adoption of large language models increases and the need for per-user or per-task model customization grows, the parameter-efficient fine-tuning (PEFT) methods, such as low-rank adaptation (LoRA) and its variants, incur substantial storage and transmission costs. To further reduce stored parameters, we introduce a "divide-and-share" paradigm that breaks the barriers of low-rank decomposition across matrix dimensions, modules and layers by sharing parameters globally via a \textit{vector bank}. As an instantiation of the paradigm to LoRA, our proposed VB-LoRA composites \textit{all} the low-rank matrices of LoRA from a shared \textit{vector bank} with a differentiable top-$k$ admixture module. VB-LoRA achieves extreme parameter efficiency while maintaining comparable or better performance compared to state-of-the-art PEFT methods. Extensive experiments demonstrate the effectiveness of VB-LoRA on natural language understanding, natural language generation, and instruction tuning tasks. When fine-tuning the Llama2-13B model, VB-LoRA only uses 0.4\% of LoRA's stored parameters yet attaining superior results. Our source code is available at \url{https://github.com/leo-yangli/VB-LoRA}.
翻译:随着大语言模型应用日益广泛,以及对用户或任务级模型定制需求的增长,参数高效微调方法(例如低秩自适应及其变体)带来了显著的存储与传输开销。为进一步减少存储参数,我们提出一种“分治共享”范式,通过\textit{向量库}实现全局参数共享,从而打破低秩分解在矩阵维度、模块与层级间的壁垒。作为该范式在LoRA中的具体实现,我们提出的VB-LoRA通过一个可微分top-$k$混合模块,从共享的\textit{向量库}中组合生成LoRA的\textit{全部}低秩矩阵。VB-LoRA在实现极致参数效率的同时,性能与当前最先进的参数高效微调方法相当或更优。大量实验验证了VB-LoRA在自然语言理解、自然语言生成以及指令微调任务上的有效性。在对Llama2-13B模型进行微调时,VB-LoRA仅需使用LoRA存储参数的0.4%,即可获得更优的结果。源代码公开于\url{https://github.com/leo-yangli/VB-LoRA}。