The growing demand for larger-scale models in the development of \textbf{L}arge \textbf{L}anguage \textbf{M}odels (LLMs) poses challenges for efficient training within limited computational resources. Traditional fine-tuning methods often exhibit instability in multi-task learning and rely heavily on extensive training resources. Here, we propose MoDULA (\textbf{M}ixture \textbf{o}f \textbf{D}omain-Specific and \textbf{U}niversal \textbf{L}oR\textbf{A}), a novel \textbf{P}arameter \textbf{E}fficient \textbf{F}ine-\textbf{T}uning (PEFT) \textbf{M}ixture-\textbf{o}f-\textbf{E}xpert (MoE) paradigm for improved fine-tuning and parameter efficiency in multi-task learning. The paradigm effectively improves the multi-task capability of the model by training universal experts, domain-specific experts, and routers separately. MoDULA-Res is a new method within the MoDULA paradigm, which maintains the model's general capability by connecting universal and task-specific experts through residual connections. The experimental results demonstrate that the overall performance of the MoDULA-Flan and MoDULA-Res methods surpasses that of existing fine-tuning methods on various LLMs. Notably, MoDULA-Res achieves more significant performance improvements in multiple tasks while reducing training costs by over 80\% without losing general capability. Moreover, MoDULA displays flexible pluggability, allowing for the efficient addition of new tasks without retraining existing experts from scratch. This progressive training paradigm circumvents data balancing issues, enhancing training efficiency and model stability. Overall, MoDULA provides a scalable, cost-effective solution for fine-tuning LLMs with enhanced parameter efficiency and generalization capability.
翻译:随着大型语言模型(LLM)发展中对更大规模模型的需求日益增长,在有限计算资源下实现高效训练面临挑战。传统的微调方法在多任务学习中常表现出不稳定性,且严重依赖大量训练资源。本文提出MoDULA(领域特定与通用LoRA混合方法),这是一种新颖的参数高效微调混合专家范式,旨在提升多任务学习中的微调效果与参数效率。该范式通过分别训练通用专家、领域特定专家和路由网络,有效提升了模型的多任务处理能力。MoDULA-Res是MoDULA范式中的一种新方法,它通过残差连接将通用专家与任务特定专家相结合,从而保持模型的通用能力。实验结果表明,MoDULA-Flan与MoDULA-Res方法在多种LLM上的综合性能均超越现有微调方法。值得注意的是,MoDULA-Res在保持通用能力的同时,将训练成本降低80%以上,并在多项任务中实现了更显著的性能提升。此外,MoDULA展现出灵活的即插即用特性,能够高效添加新任务而无需从头重新训练现有专家。这种渐进式训练范式避免了数据平衡问题,提升了训练效率与模型稳定性。总体而言,MoDULA为LLM微调提供了一种可扩展、高性价比的解决方案,兼具增强的参数效率与泛化能力。