We introduce AdaMoLE, a novel method for fine-tuning large language models (LLMs) through an Adaptive Mixture of Low-Rank Adaptation (LoRA) Experts. Moving beyond conventional methods that employ a static top-k strategy for activating experts, AdaMoLE dynamically adjusts the activation threshold using a dedicated threshold network, adaptively responding to the varying complexities of different tasks. By replacing a single LoRA in a layer with multiple LoRA experts and integrating a gating function with the threshold mechanism, AdaMoLE effectively selects and activates the most appropriate experts based on the input context. Our extensive evaluations across a variety of commonsense reasoning and natural language processing tasks show that AdaMoLE exceeds baseline performance. This enhancement highlights the advantages of AdaMoLE's adaptive selection of LoRA experts, improving model effectiveness without a corresponding increase in the expert count. The experimental validation not only confirms AdaMoLE as a robust approach for enhancing LLMs but also suggests valuable directions for future research in adaptive expert selection mechanisms, potentially broadening the scope for optimizing model performance across diverse language processing tasks.
翻译:我们提出AdaMoLE,一种通过自适应低秩适配(LoRA)专家混合对大语言模型(LLMs)进行微调的新方法。不同于采用静态top-k策略激活专家的传统方法,AdaMoLE利用专用阈值网络动态调整激活阈值,自适应地响应不同任务中的复杂程度变化。通过将单层中的单个LoRA替换为多个LoRA专家,并将门控函数与阈值机制相结合,AdaMoLE能够基于输入上下文有效选择并激活最合适的专家。我们在多种常识推理和自然语言处理任务上的广泛评估表明,AdaMoLE的性能超越基线方法。这一提升凸显了AdaMoLE自适应选择LoRA专家的优势,在不增加专家数量的情况下提升了模型有效性。实验验证不仅证实AdaMoLE是增强大语言模型的稳健方法,还为自适应专家选择机制的未来研究指明了有价值的方向,有望拓展跨各类语言处理任务优化模型性能的边界。