Although many efforts have been made, it is still a challenge to balance the training budget, downstream performance, and the general capabilities of the LLMs in many applications. Training the whole model for downstream tasks is expensive, and could easily result in catastrophic forgetting. By introducing parameter-efficient fine-tuning (PEFT), the training cost could be reduced, but it still suffers from forgetting, and limits the learning on the downstream tasks. To efficiently fine-tune the LLMs with less limitation to their downstream performance while mitigating the forgetting of general capabilities, we propose a novel mixture of expert (MoE) framework based on Soft LoRA and Identity Mixture (SLIM), that allows dynamic routing between LoRA adapters and skipping connection, enables the suppression of forgetting. We adopt weight-yielding with sliding clustering for better out-of-domain distinguish to enhance the routing. We also propose to convert the mixture of low-rank adapters to the model merging formulation and introduce fast dynamic merging of LoRA adapters to keep the general capabilities of the base model. Extensive experiments demonstrate that the proposed SLIM is comparable to the state-of-the-art PEFT approaches on the downstream tasks while achieving the leading performance in mitigating catastrophic forgetting.
翻译:尽管已有诸多尝试,但在许多应用中平衡训练成本、下游任务性能与大语言模型的通用能力仍是一个挑战。对整个模型进行下游任务训练成本高昂,且极易导致灾难性遗忘。通过引入参数高效微调方法,虽能降低训练成本,但仍存在遗忘问题,并限制了下游任务的学习能力。为在高效微调大语言模型的同时,尽可能减少对其下游性能的限制并缓解通用能力的遗忘,我们提出了一种基于Soft LoRA与Identity Mixture的新型专家混合框架SLIM。该框架支持在LoRA适配器与跳跃连接之间进行动态路由,从而有效抑制遗忘。我们采用基于滑动聚类的权重分配机制,以提升跨领域区分能力,从而优化路由效果。此外,我们提出将低秩适配器的混合形式转化为模型融合的数学表达,并引入LoRA适配器的快速动态融合方法,以保持基础模型的通用能力。大量实验表明,所提出的SLIM方法在下游任务上可与当前最先进的参数高效微调方法相媲美,同时在缓解灾难性遗忘方面取得了领先的性能。