With the ever-growing size of pretrained models (PMs), fine-tuning them has become more expensive and resource-hungry. As a remedy, low-rank adapters (LoRA) keep the main pretrained weights of the model frozen and just introduce some learnable truncated SVD modules (so-called LoRA blocks) to the model. While LoRA blocks are parameter-efficient, they suffer from two major problems: first, the size of these blocks is fixed and cannot be modified after training (for example, if we need to change the rank of LoRA blocks, then we need to re-train them from scratch); second, optimizing their rank requires an exhaustive search and effort. In this work, we introduce a dynamic low-rank adaptation (DyLoRA) technique to address these two problems together. Our DyLoRA method trains LoRA blocks for a range of ranks instead of a single rank by sorting the representation learned by the adapter module at different ranks during training. We evaluate our solution on different natural language understanding (GLUE benchmark) and language generation tasks (E2E, DART and WebNLG) using different pretrained models such as RoBERTa and GPT with different sizes. Our results show that we can train dynamic search-free models with DyLoRA at least 4 to 7 times (depending to the task) faster than LoRA without significantly compromising performance. Moreover, our models can perform consistently well on a much larger range of ranks compared to LoRA.
翻译:随着预训练模型规模的持续增长,对其进行微调的成本和资源消耗日益高昂。低秩适配器(LoRA)通过冻结模型的主要预训练权重,仅引入少量可学习的截断SVD模块(即LoRA模块)来解决此问题。尽管LoRA模块具有参数高效性,但仍存在两大问题:首先,这些模块的规模在训练后固定不可调整(例如,若需改变LoRA模块的秩,则需从头重新训练);其次,其秩的优化需要穷举式搜索和大量计算。针对上述问题,本文提出一种动态低秩适配(DyLoRA)技术。该方法通过按不同秩在训练过程中对适配器模块所学表征进行排序,使LoRA模块可针对一个秩范围而非单一秩进行训练。我们在不同规模的预训练模型(如RoBERTa和GPT)上,分别针对自然语言理解任务(GLUE基准测试)和语言生成任务(E2E、DART和WebNLG)进行评估。结果表明,与标准LoRA相比,DyLoRA可在性能基本不妥协的前提下实现至少4至7倍(具体取决于任务)的加速训练,且训练后的模型在远大于LoRA的秩范围内均能保持稳定性能。