Large Language Models (LLMs) have demonstrated considerable proficiency in general natural language processing (NLP) tasks. Instruction tuning, a successful paradigm, enhances the ability of LLMs to follow natural language instructions and exhibit robust generalization across a wide range of tasks. However, these models often encounter performance limitations across multiple tasks due to constrained model capacity. Expanding this capacity during the instruction tuning phase poses significant challenges. To address this issue, we introduce a novel approach, Parameter-Efficient Sparsity Crafting (PESC), which transitions dense models to sparse models using a Mixture of Experts (MoE) architecture. PESC integrates adapters into the MoE layers of sparse models, differentiating experts without altering the individual weights within these layers. This method significantly reduces computational costs and GPU memory requirements, facilitating model capacity expansion through a minimal increase in parameters via the inserted adapters. Our empirical evaluation demonstrates the effectiveness of the PESC method. Using PESC during instruction tuning, our sparse models, dubbed Camelidae outperform all other opensource sparse models and exhibit superior general capabilities compared to GPT3.5.
翻译:大型语言模型(LLMs)在通用自然语言处理任务中展现出显著能力。指令微调作为一种成功范式,能增强LLMs遵循自然语言指令的能力,并在广泛任务中表现出稳健的泛化性。然而,受限于模型容量,这些模型在多任务场景下常遭遇性能瓶颈。在指令微调阶段扩展模型容量面临重大挑战。针对这一问题,我们提出了一种名为“参数高效稀疏化构建”(PESC)的新方法,通过混合专家(MoE)架构将稠密模型转化为稀疏模型。PESC将适配器集成到稀疏模型的MoE层中,在不改变层内各权重的前提下实现专家差异化。该方法显著降低计算成本与GPU内存需求,通过插入的适配器以最小参数量增量实现模型容量扩展。我们的实证评估证明了PESC方法的有效性。在指令微调中应用PESC后,我们构建的稀疏模型Camelidae不仅优于所有其他开源稀疏模型,还展现出超越GPT3.5的通用能力。