We investigate parameter-efficient fine-tuning (PEFT) methods that can provide good accuracy under limited computational and memory budgets in the context of large language models (LLMs). We present a new PEFT method called Robust Adaptation (RoSA) inspired by robust principal component analysis (PCA) that jointly trains $\textit{low-rank}$ and $\textit{highly-sparse}$ components on top of a set of fixed pretrained weights to efficiently approximate the performance of a full-fine-tuning (FFT) solution. Across a series of challenging generative tasks such as grade-school math and SQL query generation, which require fine-tuning for good performance, we show that RoSA outperforms both LoRA and pure sparse fine-tuning, at the same parameter budget. We provide system support for RoSA to complement the training algorithm, specifically in the form of sparse GPU kernels which enable memory- and computationally-efficient training. Our code will be made available at https://github.com/IST-DASLab/RoSA.
翻译:我们研究了在大型语言模型(LLMs)的有限计算和内存预算下能够提供良好精度的参数高效微调(PEFT)方法。受鲁棒主成分分析(PCA)启发,我们提出了一种名为鲁棒自适应(RoSA)的新型PEFT方法,该方法在冻结的预训练权重基础上联合训练$\textit{低秩}$和$\textit{高度稀疏}$组件,以高效逼近全参数微调(FFT)的性能。在一系列具有挑战性的生成任务(例如小学数学和SQL查询生成,这些任务需要微调才能达到良好性能)中,我们证明在相同参数预算下,RoSA的性能优于LoRA和纯稀疏微调。我们为RoSA提供了系统支持以补充训练算法,具体形式为稀疏GPU内核,实现了内存和计算高效的训练。我们的代码将开源在https://github.com/IST-DASLab/RoSA。