Although many efforts have been made, it is still a challenge to balance the training budget, downstream performance, and the general capabilities of the LLMs in many applications. Training the whole model for downstream tasks is expensive, and could easily result in catastrophic forgetting. By introducing parameter-efficient fine-tuning (PEFT), the training cost could be reduced, but it still suffers from forgetting, and limits the learning on the downstream tasks. To efficiently fine-tune the LLMs with less limitation to their downstream performance while mitigating the forgetting of general capabilities, we propose a novel mixture of expert (MoE) framework based on Soft LoRA and Identity Mixture (SLIM), that allows dynamic routing between LoRA adapters and skipping connection, enables the suppression of forgetting. We adopt weight-yielding with sliding clustering for better out-of-domain distinguish to enhance the routing. We also propose to convert the mixture of low-rank adapters to the model merging formulation and introduce fast dynamic merging of LoRA adapters to keep the general capabilities of the base model. Extensive experiments demonstrate that the proposed SLIM is comparable to the state-of-the-art PEFT approaches on the downstream tasks while achieving the leading performance in mitigating catastrophic forgetting.
翻译:尽管已有诸多努力,但在众多应用中平衡训练成本、下游性能与大语言模型的通用能力仍是一个挑战。为下游任务训练整个模型代价高昂,且极易导致灾难性遗忘。通过引入参数高效微调方法,训练成本得以降低,但其仍受遗忘问题困扰,并限制了下游任务的学习能力。为了在高效微调大语言模型的同时,尽可能减少对其下游性能的限制并缓解通用能力的遗忘,我们提出了一种基于软LoRA与恒等混合的新型专家混合框架SLIM。该框架支持在LoRA适配器与跳跃连接之间进行动态路由,从而有效抑制遗忘现象。我们采用带滑动聚类的权重生成机制以增强跨域区分能力,从而优化路由效果。此外,我们提出将低秩适配器的混合形式转化为模型融合的数学表达,并引入LoRA适配器的快速动态融合技术以保持基础模型的通用能力。大量实验表明,所提出的SLIM方法在下游任务上可与当前最先进的参数高效微调方法相媲美,同时在缓解灾难性遗忘方面取得了领先的性能表现。