Large language models demonstrate impressive performance on downstream tasks, yet they require extensive resource consumption when fully fine-tuning all parameters. To mitigate this, Parameter Efficient Fine-Tuning (PEFT) strategies, such as LoRA, have been developed. In this paper, we delve into the concept of task-specific directions (TSDs), which are critical for transitioning large models from pretrained states to task-specific enhancements in PEFT. We propose a framework to clearly define these directions and explore their properties and practical utilization challenges. We then introduce a novel approach, LoRA-Dash, which aims to maximize the impact of TSDs during the fine-tuning process, thereby enhancing model performance on targeted tasks. Additionally, based on our exploration of TSD, we focus on an important issue in PEFT: the initialization of LoRA. While some works have pointed out the significance of initialization for LoRA's performance and proposed various strategies, these methods are often empirical and not task-specific. To address this issue, we propose LoRA-Init. Starting from TSD, we identify the directions that require the most adjustment during fine-tuning for downstream tasks. By initializing the matrices in LoRA with these directions, LoRA-Init significantly enhances LoRA's performance. Moreover, we can combine LoRA-Dash and LoRA-Init to create the final version of LoRA based on TSDs, which we refer to as LoRA-TSD. Extensive experiments have conclusively demonstrated the effectiveness of these methods, and in-depth analyses further reveal the underlying mechanisms behind their success.
翻译:大型语言模型在下游任务上展现出卓越性能,但全参数微调需要消耗大量计算资源。为缓解此问题,参数高效微调(PEFT)策略(如LoRA)应运而生。本文深入探讨任务特定方向(TSD)的概念,该方向对于大型模型从预训练状态向PEFT中的任务特定增强转变至关重要。我们提出一个明确定义这些方向的框架,并探索其特性与实际应用挑战。随后,我们提出一种新方法LoRA-Dash,旨在最大化TSD在微调过程中的影响,从而提升模型在目标任务上的性能。此外,基于对TSD的探索,我们聚焦于PEFT中的一个重要问题:LoRA的初始化。尽管已有研究指出初始化对LoRA性能的重要性并提出多种策略,但这些方法通常基于经验且非任务特定。为解决此问题,我们提出LoRA-Init。从TSD出发,我们识别下游任务微调过程中最需调整的方向,通过将这些方向用于初始化LoRA矩阵,LoRA-Init显著提升了LoRA性能。进一步地,我们可将LoRA-Dash与LoRA-Init结合,构建基于TSD的最终版LoRA,称为LoRA-TSD。大量实验充分证明了这些方法的有效性,深入分析进一步揭示了其成功的内在机制。