Although motivated by the adaptation of text-to-speech synthesis models, we argue that more generic parameter-efficient fine-tuning (PEFT) is an appropriate framework to do such adaptation. However, catastrophic forgetting remains an issue with PEFT, damaging the pre-trained model's inherent capabilities. We demonstrate that existing Bayesian learning techniques can be applied to PEFT to prevent catastrophic forgetting as long as the parameter shift of the fine-tuned layers can be calculated differentiably. In a principled series of experiments on language modeling and speech synthesis tasks, we utilize established Laplace approximations, including diagonal and Kronecker factored approaches, to regularize PEFT with the low-rank adaptation (LoRA) and compare their performance in pre-training knowledge preservation. Our results demonstrate that catastrophic forgetting can be overcome by our methods without degrading the fine-tuning performance, and using the Kronecker factored approximations produces a better preservation of the pre-training knowledge than the diagonal ones.
翻译:尽管我们的研究受文本-语音合成模型适配需求的启发,但我们认为更通用的参数高效微调(PEFT)框架适用于此类适配任务。然而,灾难性遗忘问题仍存在于PEFT中,会损害预训练模型原有的能力。我们证明,只要微调层的参数偏移可微分计算,即可将现有贝叶斯学习方法应用于PEFT以防止灾难性遗忘。通过在语言建模和语音合成任务上开展一系列严谨实验,我们采用包括对角近似和Kronecker因子近似在内的经典拉普拉斯近似方法,对基于低秩适配(LoRA)的PEFT进行正则化,并比较其在预训练知识保持方面的性能。结果表明,我们的方法能在不降低微调性能的前提下克服灾难性遗忘,且采用Kronecker因子近似比对角近似能更有效地保持预训练知识。