With increasing size of large language models (LLMs), full-parameter fine-tuning imposes substantial memory demands. To alleviate this, we propose a novel memory-efficient training paradigm called Momentum Low-rank compression (MLorc). The key idea of MLorc is to compress and reconstruct the momentum of matrix parameters during training to reduce memory consumption. Compared to LoRA, MLorc avoids enforcing a fixed-rank constraint on weight update matrices and thus enables full-parameter learning. Compared to GaLore, MLorc directly compress the momentum rather than gradients, thereby better preserving the training dynamics of full-parameter fine-tuning. We provide a theoretical guarantee for its convergence under mild assumptions. Empirically, MLorc consistently outperforms other memory-efficient training methods, matches or even exceeds the performance of full fine-tuning at small ranks (e.g., $r=4$), and generalizes well across different optimizers, all while not compromising time or memory efficiency.
翻译:随着大语言模型(LLMs)规模的持续增长,全参数微调带来了巨大的内存需求。为缓解这一问题,我们提出了一种名为动量低秩压缩(MLorc)的新型内存高效训练范式。MLorc的核心思想是在训练过程中对矩阵参数的动量进行压缩与重构,从而降低内存消耗。与LoRA相比,MLorc避免了对权重更新矩阵施加固定秩约束,因此能够实现全参数学习。与GaLore相比,MLorc直接压缩动量而非梯度,从而更好地保留全参数微调的训练动态特性。我们在温和假设条件下为其收敛性提供了理论保障。实验表明,MLorc在性能上始终优于其他内存高效训练方法,在小秩(如$r=4$)条件下可媲美甚至超越全参数微调的效果,且在不同优化器下具有良好的泛化能力,同时不牺牲时间效率或内存效率。