Fine-tuning is often necessary to enhance the adaptability of Large Language Models (LLM) to downstream tasks. Nonetheless, the process of updating billions of parameters demands significant computational resources and training time, which poses a substantial obstacle to the widespread application of large-scale models in various scenarios. To address this issue, Parameter-Efficient Fine-Tuning (PEFT) has emerged as a prominent paradigm in recent research. However, current PEFT approaches that employ a limited set of global parameters (such as LoRA, which adds low-rank approximation matrices to all weights) face challenges in flexibly combining different computational modules in downstream tasks. In this work, we introduce a novel PEFT method: MoELoRA. We consider LoRA as Mixture of Experts (MoE), and to mitigate the random routing phenomenon observed in MoE, we propose the utilization of contrastive learning to encourage experts to learn distinct features. We conducted experiments on 11 tasks in math reasoning and common-sense reasoning benchmarks. With the same number of parameters, our approach outperforms LoRA significantly. In math reasoning, MoELoRA achieved an average performance that was 4.2% higher than LoRA, and demonstrated competitive performance compared to the 175B GPT-3.5 on several benchmarks.
翻译:微调通常是提升大语言模型(LLM)对下游任务适应性的必要手段。然而,更新数十亿参数的过程需要大量计算资源和训练时间,这在很大程度上阻碍了大规模模型在各场景中的广泛应用。为解决该问题,参数高效微调(PEFT)已成为近期研究中的典型范式。然而,当前采用有限全局参数集的PEFT方法(如LoRA,通过向所有权重添加低秩近似矩阵)在下游任务中灵活组合不同计算模块时面临挑战。本文提出一种新型PEFT方法:MoELoRA。我们将LoRA视为混合专家(MoE),为缓解MoE中常见的随机路由现象,提出利用对比学习促使不同专家学习差异化特征。我们在数学推理和常识推理基准测试的11项任务上开展实验。在参数数量相同的情况下,本方法显著优于LoRA。在数学推理任务中,MoELoRA的平均性能较LoRA提升4.2%,并在多个基准测试中展现出与175B参数GPT-3.5相竞争的优异表现。