We investigate parameter-efficient fine-tuning (PEFT) methods that can provide good accuracy under limited computational and memory budgets in the context of large language models (LLMs). We present a new PEFT method called Robust Adaptation (RoSA) inspired by robust principal component analysis that jointly trains $\textit{low-rank}$ and $\textit{highly-sparse}$ components on top of a set of fixed pretrained weights to efficiently approximate the performance of a full-fine-tuning (FFT) solution. Across a series of challenging generative tasks such as grade-school math and SQL query generation, which require fine-tuning for good performance, we show that RoSA outperforms LoRA, pure sparse fine-tuning, and alternative hybrid methods at the same parameter budget, and can even recover the performance of FFT on some tasks. We provide system support for RoSA to complement the training algorithm, specifically in the form of sparse GPU kernels which enable memory- and computationally-efficient training, and show that it is also compatible with low-precision base weights, resulting in the first joint representation combining quantization, low-rank and sparse approximations. Our code is accessible at https://github.com/IST-DASLab/RoSA.
翻译:我们研究了在大型语言模型(LLMs)的有限计算和内存预算下,能够提供良好精度的参数高效微调(PEFT)方法。受鲁棒主成分分析启发,我们提出了一种名为鲁棒自适应(RoSA)的新型PEFT方法,该方法在固定的预训练权重之上联合训练$\textit{低秩}$和$\textit{高稀疏}$分量,以高效逼近全微调(FFT)解的性能。在一系列具有挑战性的生成任务(如小学数学和SQL查询生成,这些任务需要微调才能获得良好性能)中,我们证明在相同参数预算下RoSA优于LoRA、纯稀疏微调以及替代性混合方法,甚至能在某些任务上恢复FFT的性能。我们为RoSA提供系统支持以补充训练算法,具体形式是稀疏GPU内核,这些内核实现了内存和计算高效的训练。我们还证明RoSA与低精度基础权重兼容,从而首次实现了结合量化、低秩和稀疏近似的联合表示。我们的代码可在https://github.com/IST-DASLab/RoSA获取。