Recently, LoRA has emerged as a crucial technique for fine-tuning large pre-trained models, yet its performance in multi-task learning scenarios often falls short. In contrast, the MoE architecture presents a natural solution to this issue. However, it introduces challenges such as mutual interference of data across multiple domains and knowledge forgetting of various tasks. Additionally, MoE significantly increases the number of parameters, posing a computational cost challenge. Therefore, in this paper, we propose MoSLD, a mixture-of-shared-LoRAs model with a dropout strategy. MoSLD addresses these challenges by sharing the upper projection matrix in LoRA among different experts, encouraging the model to learn general knowledge across tasks, while still allowing the lower projection matrix to focus on the unique features of each task. The application of dropout alleviates the imbalanced update of parameter matrix and mitigates parameter overfitting in LoRA. Extensive experiments demonstrate that our model exhibits excellent performance in both single-task and multi-task scenarios, with robust out-of-domain generalization capabilities.
翻译:近年来,LoRA已成为微调大型预训练模型的关键技术,但其在多任务学习场景中的表现往往不尽如人意。相比之下,MoE架构为这一问题提供了天然的解决方案。然而,它也带来了诸如跨多个领域的数据相互干扰以及不同任务的知识遗忘等挑战。此外,MoE显著增加了参数量,带来了计算成本上的挑战。因此,本文提出MoSLD,一种采用dropout策略的共享LoRA混合模型。MoSLD通过在不同专家间共享LoRA中的上投影矩阵,促使模型学习跨任务的通用知识,同时仍允许下投影矩阵专注于每个任务的独特特征。Dropout策略的应用缓解了参数矩阵的不均衡更新,并减轻了LoRA中的参数过拟合问题。大量实验表明,我们的模型在单任务和多任务场景下均表现出优异的性能,并具备强大的跨领域泛化能力。