Large Language Models (LLMs) are difficult to fully fine-tune (e.g., with instructions or human feedback) due to their sheer number of parameters. A family of parameter-efficient sparse fine-tuning methods have proven promising in terms of performance but their memory requirements increase proportionally to the size of the LLMs. In this work, we scale sparse fine-tuning to state-of-the-art LLMs like LLaMA 2 7B and 13B. We propose SpIEL, a novel sparse fine-tuning method which, for a desired density level, maintains an array of parameter indices and the deltas of these parameters relative to their pretrained values. It iterates over: (a) updating the active deltas, (b) pruning indices (based on the change of magnitude of their deltas) and (c) regrowth of indices. For regrowth, we explore two criteria based on either the accumulated gradients of a few candidate parameters or their approximate momenta estimated using the efficient SM3 optimizer. We experiment with instruction-tuning of LLMs on standard dataset mixtures, finding that SpIEL is often superior to popular parameter-efficient fine-tuning methods like LoRA (low-rank adaptation) in terms of performance and comparable in terms of run time. We additionally show that SpIEL is compatible with both quantization and efficient optimizers, to facilitate scaling to ever-larger model sizes. We release the code for SpIEL at https://github.com/AlanAnsell/peft and for the instruction-tuning experiments at https://github.com/ducdauge/sft-llm.
翻译:大型语言模型(LLM)因参数量庞大而难以进行完整微调(例如通过指令或人类反馈)。一类参数高效的稀疏微调方法在性能上展现出潜力,但其内存需求随LLM规模成比例增长。在本工作中,我们将稀疏微调扩展至LLaMA 2 7B和13B等最先进的LLM。我们提出SpIEL,一种新颖的稀疏微调方法,针对目标密度水平维护参数索引数组及其相对于预训练值的增量。该方法迭代执行:(a)更新活跃增量,(b)基于增量幅值变化裁剪索引,以及(c)索引再生。对于索引再生,我们探索两种准则:基于少量候选参数的累积梯度,或利用高效SM3优化器估计的近似动量。我们在标准数据集混合上对LLM进行指令微调实验,发现SpIEL在性能上通常优于LoRA(低秩适配)等流行参数高效微调方法,且运行时间相当。我们进一步证明SpIEL兼容量化和高效优化器,以支持向更大模型规模扩展。我们在https://github.com/AlanAnsell/peft 和 https://github.com/ducdauge/sft-llm 分别发布SpIEL代码及指令微调实验代码。