Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates

It is a common practice in natural language processing to pre-train a single model on a general domain and then fine-tune it for downstream tasks. However, when it comes to Large Language Models, fine-tuning the entire model can be computationally expensive, resulting in very intensive energy consumption. As a result, several Parameter Efficient Fine-Tuning (PEFT) approaches were recently proposed. One of the most popular approaches is low-rank adaptation (LoRA), where the key insight is decomposing the update weights of the pre-trained model into two low-rank matrices. However, the proposed approaches either use the same rank value across all different weight matrices, which has been shown to be a sub-optimal choice, or do not use any quantization technique, one of the most important factors when it comes to a model's energy consumption. In this work, we propose Bayesian-LoRA which approaches low-rank adaptation and quantization from a Bayesian perspective by employing a prior distribution on both quantization levels and rank values. As a result, B-LoRA is able to fine-tune a pre-trained model on a specific downstream task, finding the optimal rank values and quantization levels for every low-rank matrix. We validate the proposed model by fine-tuning a pre-trained DeBERTaV3 on the GLUE benchmark. Moreover, we compare it to relevant baselines and present both qualitative and quantitative results, showing how the proposed approach is able to learn optimal-rank quantized matrices. B-LoRA performs on par with or better than the baselines while reducing the total number of bit operations by roughly 70% compared to the baseline methods.

翻译：在自然语言处理中，通常的做法是在通用领域预训练单一模型，随后针对下游任务进行微调。然而，对于大语言模型而言，对整个模型进行微调的计算成本高昂，会导致极高的能耗。因此，近期提出了多种参数高效微调方法。其中最流行的方法之一是低秩自适应，其核心思想是将预训练模型的更新权重分解为两个低秩矩阵。然而，现有方法要么对所有不同的权重矩阵使用相同的秩值（这已被证明是次优选择），要么未采用量化技术（而量化是影响模型能耗的关键因素之一）。本研究提出贝叶斯-LoRA，该方法从贝叶斯视角处理低秩自适应与量化问题，通过对量化层级和秩值施加先验分布来实现。因此，B-LoRA能够在特定下游任务上微调预训练模型，并为每个低秩矩阵寻得最优秩值与量化层级。我们通过在GLUE基准上微调预训练的DeBERTaV3模型来验证所提方法。此外，我们将其与相关基线模型进行比较，并呈现定性与定量结果，展示所提方法如何学习最优秩量化矩阵。B-LoRA在性能上与基线模型相当或更优，同时将总比特运算量较基线方法降低约70%。