Recent advancements in Large Language Models (LLMs) have achieved robust performance across diverse tasks, but fine-tuning these models for specific domains remains resource-intensive. Parameter-Efficient Fine-Tuning (PEFT) methods like Low-Rank Adaptation (LoRA) address this challenge by fine-tuning a small subset of parameters. However, existing methods for fusing multiple LoRAs lack dynamic fusion based on contextual inputs and often increase inference time due to token-level operations. We propose DLP-LoRA, a Dynamic Lightweight Plugin that employs a mini-MLP module with only 5M parameters to dynamically fuse multiple LoRAs at the sentence level using top-p sampling strategies. This approach reduces inference time to less than twice that of single LoRA inference by leveraging parallel computation. Evaluations across 26 tasks-including multiple-choice questions and question answering-demonstrate that DLP-LoRA achieves an average accuracy of 92.34% on multiple-choice datasets and significant improvements in BLEU and ROUGE scores on QA datasets, outperforming different LLMs backbones under composite task settings. DLP-LoRA effectively balances performance and efficiency, making it a practical solution for dynamic multi-task adaptation in LLMs. Our code is available at https://github.com/MeCuping/DLP-LoRA.
翻译:近年来,大型语言模型(LLMs)在多样化任务上取得了稳健的性能,但针对特定领域微调这些模型仍然资源密集。参数高效微调(PEFT)方法,如低秩适应(LoRA),通过微调一小部分参数来解决这一挑战。然而,现有融合多个LoRA的方法缺乏基于上下文输入的动态融合能力,并且由于令牌级操作常常增加推理时间。我们提出了DLP-LoRA,这是一种动态轻量级插件,它采用一个仅含500万参数的小型MLP模块,利用top-p采样策略在句子级别动态融合多个LoRA。该方法通过并行计算,将推理时间减少到单LoRA推理时间的两倍以内。在涵盖多项选择题和问答等26个任务上的评估表明,DLP-LoRA在多项选择数据集上平均准确率达到92.34%,在问答数据集上的BLEU和ROUGE分数也有显著提升,在复合任务设置下优于不同的LLM骨干模型。DLP-LoRA有效平衡了性能与效率,为LLMs中的动态多任务适应提供了一个实用的解决方案。我们的代码可在 https://github.com/MeCuping/DLP-LoRA 获取。