Mixture-of-Experts (MoE) has become a prominent paradigm for scaling Large Language Models (LLMs). Parameter-efficient fine-tuning methods, such as LoRA, are widely adopted to adapt pretrained MoE LLMs to downstream tasks. However, existing approaches typically assign identical LoRA ranks to all expert modules, ignoring the heterogeneous specialization of pretrained experts. This uniform allocation leads to a resource mismatch: task-relevant experts are under-provisioned, while less relevant ones receive redundant parameters. To address this, we propose DR-LoRA, a Dynamic Rank LoRA framework for fine-tuning pretrained MoE models. Specifically, DR-LoRA initializes all expert LoRA modules with a small active rank and uses an expert saliency score, which combines routing frequency and gradient-based rank importance, to identify which experts would benefit most from additional capacity. It then periodically expands the active ranks of the task-critical expert LoRA, progressively constructing a heterogeneous rank distribution tailored to the target task. Experiments on three MoE models across six tasks show that DR-LoRA consistently outperforms LoRA and other strong baselines, demonstrating that task-adaptive heterogeneous rank allocation is an effective strategy to improve active capacity utilization in MoE fine-tuning.
翻译:混合专家模型已成为扩展大型语言模型的主流范式。参数高效微调方法(如LoRA)被广泛用于将预训练混合专家大语言模型适配至下游任务。然而,现有方法通常为所有专家模块分配相同的LoRA秩,忽视了预训练专家异构化的专业特性。这种统一分配导致资源错配:任务相关专家参数不足,而非相关专家则被冗余参数占用。为此,我们提出DR-LoRA——一种面向预训练混合专家模型微调的动态秩LoRA框架。具体而言,DR-LoRA为所有专家LoRA模块初始化较小的活跃秩,并采用结合路由频率与基于梯度的秩重要性的专家显著性分数,识别哪些专家能从额外容量中获益最大。随后,该方法周期性扩展任务关键专家LoRA的活跃秩,逐步构建适配目标任务的异构秩分布。在六个任务上对三个混合专家模型的实验表明,DR-LoRA持续优于LoRA及其他强基线方法,证明任务自适应异构秩分配是提升混合专家模型微调中活跃容量利用率的有效策略。