We investigate parameter-efficient fine-tuning (PEFT) methods that can provide good accuracy under limited computational and memory budgets in the context of large language models (LLMs). We present a new PEFT method called Robust Adaptation (RoSA) inspired by robust principal component analysis that jointly trains $\textit{low-rank}$ and $\textit{highly-sparse}$ components on top of a set of fixed pretrained weights to efficiently approximate the performance of a full-fine-tuning (FFT) solution. Across a series of challenging generative tasks such as grade-school math and SQL query generation, which require fine-tuning for good performance, we show that RoSA outperforms LoRA, pure sparse fine-tuning, and alternative hybrid methods at the same parameter budget, and can even recover the performance of FFT on some tasks. We provide system support for RoSA to complement the training algorithm, specifically in the form of sparse GPU kernels which enable memory- and computationally-efficient training, and show that it is also compatible with low-precision base weights, resulting in the first joint representation combining quantization, low-rank and sparse approximations. Our code is available at https://github.com/IST-DASLab/RoSA.
翻译:本文研究在大型语言模型(LLMs)背景下,能够在有限计算和内存预算下提供良好性能的参数高效微调(PEFT)方法。我们提出了一种名为鲁棒自适应(RoSA)的新PEFT方法,其灵感来源于鲁棒主成分分析。该方法在一组固定的预训练权重之上联合训练 $\textit{低秩}$ 和 $\textit{高度稀疏}$ 分量,以高效逼近全参数微调(FFT)方案的性能。在一系列需要微调才能获得良好性能的挑战性生成任务(如小学数学和SQL查询生成)中,我们证明在相同参数预算下,RoSA优于LoRA、纯稀疏微调以及其他混合方法,甚至在某些任务上能够恢复FFT的性能。我们为RoSA提供了系统支持以补充其训练算法,具体形式为稀疏GPU内核,该内核支持内存和计算高效的训练。我们还证明了该方法与低精度基础权重兼容,从而首次实现了量化、低秩和稀疏近似相结合的联合表示。我们的代码可在 https://github.com/IST-DASLab/RoSA 获取。