Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP) but demand massive GPU resources for training. Lowering the threshold for LLMs training would encourage greater participation from researchers, benefiting both academia and society. While existing approaches have focused on parameter-efficient fine-tuning, which tunes or adds a small number of parameters, few have addressed the challenge of tuning the full parameters of LLMs with limited resources. In this work, we propose a new optimizer, LOw-Memory Optimization (LOMO), which fuses the gradient computation and the parameter update in one step to reduce memory usage. By integrating LOMO with existing memory saving techniques, we reduce memory usage to 10.8% compared to the standard approach (DeepSpeed solution). Consequently, our approach enables the full parameter fine-tuning of a 65B model on a single machine with 8 RTX 3090, each with 24GB memory.
翻译:大语言模型(LLMs)彻底改变了自然语言处理(NLP),但训练过程需要大量GPU资源。降低LLMs训练门槛将鼓励更多研究者参与,惠及学术界和社会。现有方法主要集中于参数高效微调(即调整或添加少量参数),但鲜有研究关注在有限资源下对LLMs进行全参数微调。本文提出一种新型优化器——低内存优化(LOMO),通过将梯度计算与参数更新融合为一步以降低内存占用。通过将LOMO与现有内存节省技术结合,我们实现了相较标准方法(DeepSpeed方案)内存占用降低至10.8%的效果。最终,我们的方法使得在配备8块显存为24GB的RTX 3090单机上,即可完成65B模型的全参数微调。