With the prevalence of pre-training-fine-tuning paradigm, how to efficiently adapt the pre-trained model to the downstream tasks has been an intriguing issue. Parameter-Efficient Fine-Tuning (PEFT) methods have been proposed for low-cost adaptation. Although PEFT has demonstrated effectiveness and been widely applied, the underlying principles are still unclear. In this paper, we adopt the PAC-Bayesian generalization error bound, viewing pre-training as a shift of prior distribution which leads to a tighter bound for generalization error. We validate this shift from the perspectives of oscillations in the loss landscape and the quasi-sparsity in gradient distribution. Based on this, we propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT), and validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning. The code is accessible at https://github.com/song-wx/SIFT/.
翻译:随着预训练-微调范式的普及,如何高效地将预训练模型适配到下游任务成为一个引人关注的问题。参数高效微调方法已被提出以实现低成本适配。尽管PEFT已展现出有效性并被广泛应用,但其基本原理仍不清晰。本文采用PAC-Bayesian泛化误差界,将预训练视为先验分布的偏移,该偏移导致更紧的泛化误差界。我们从损失景观的振荡和梯度分布的准稀疏性两个角度验证了这一偏移。基于此,我们提出了一种基于梯度的稀疏微调算法——稀疏增量微调(SIFT),并在包括GLUE基准测试和指令微调的一系列任务中验证了其有效性。代码可访问https://github.com/song-wx/SIFT/获取。