Large language models (LLMs) have demonstrated remarkable success across various tasks, accompanied by a continuous increase in their parameter size. Parameter-efficient fine-tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), address the challenges of fine-tuning LLMs by significantly reducing the number of trainable parameters. Recent studies have integrated LoRA with Mixture of Experts (MoE) architectures, leveraging multiple adapter experts and gating mechanisms to further improve fine-tuning performance. However, existing approaches primarily focus on adjusting the allocations of adapter experts per layer to optimize the introduced trainable parameter size, while neglecting a critical factor of adapters' rank. To this end, we propose a hierarchical scheme for expert allocation and rank configuration, HILO, which dynamically adjusts the number and rank of adapter experts across layers, matching the varying representational complexity of model layers in adapter-granularity. Extensive experiments on multiple benchmark tasks demonstrate that HILO outperforms existing methods in accuracy while introducing fewer trainable parameters, providing an efficient and practical solution for fine-tuning LLMs.
翻译:大语言模型(LLM)已在各类任务中展现出卓越性能,其参数量亦持续增长。参数高效微调(PEFT)方法,如低秩自适应(LoRA),通过大幅减少可训练参数量以应对LLM微调的挑战。近期研究将LoRA与专家混合(MoE)架构相结合,利用多个适配器专家及门控机制进一步提升微调性能。然而,现有方法主要关注通过调整每层适配器专家的分配以优化引入的可训练参数量,却忽视了适配器秩次这一关键因素。为此,我们提出一种专家分配与秩次配置的层级化方案HILO,该方案在适配器粒度上动态调整各层适配器专家的数量与秩次,以匹配模型层间变化的表示复杂度。在多个基准任务上的大量实验表明,HILO在引入更少可训练参数的同时,其准确率优于现有方法,为LLM微调提供了一种高效实用的解决方案。