This work presents BAdam, an optimizer that leverages the block coordinate optimization framework with Adam as the inner solver. BAdam offers a memory efficient approach to the full parameter finetuning of large language models and reduces running time of the backward process thanks to the chain rule property. Experimentally, we apply BAdam to instruction-tune the Llama 2-7B model on the Alpaca-GPT4 dataset using a single RTX3090-24GB GPU. The results indicate that BAdam exhibits superior convergence behavior in comparison to LoRA and LOMO. Furthermore, our downstream performance evaluation of the instruction-tuned models using the MT-bench shows that BAdam modestly surpasses LoRA and more substantially outperforms LOMO. Finally, we compare BAdam with Adam on a medium-sized task, i.e., finetuning RoBERTa-large on the SuperGLUE benchmark. The results demonstrate that BAdam is capable of narrowing the performance gap with Adam. Our code is available at https://github.com/Ledzy/BAdam.
翻译:本文提出BAdam优化器,该优化器采用块坐标优化框架,并以Adam作为内部求解器。BAdam为大语言模型的全参数微调提供了一种内存高效的方法,并利用链式法则特性减少了反向传播过程的运行时间。实验方面,我们在单张RTX3090-24GB GPU上,使用Alpaca-GPT4数据集对Llama 2-7B模型进行指令微调。结果表明,与LoRA和LOMO相比,BAdam展现出更优的收敛行为。此外,我们使用MT-bench对指令微调模型进行的下游性能评估显示,BAdam略优于LoRA,且显著超越LOMO。最后,我们将BAdam与Adam在中型任务(即在SuperGLUE基准上微调RoBERTa-large)上进行对比,结果表明BAdam能够缩小与Adam的性能差距。我们的代码开源在https://github.com/Ledzy/BAdam。