Large language models demonstrate impressive performance on downstream tasks, yet they require extensive resource consumption when fully fine-tuning all parameters. To mitigate this, Parameter Efficient Fine-Tuning (PEFT) strategies, such as LoRA, have been developed. In this paper, we delve into the concept of task-specific directions (TSDs), which are critical for transitioning large models from pretrained states to task-specific enhancements in PEFT. We propose a framework to clearly define these directions and explore their properties and practical utilization challenges. We then introduce a novel approach, LoRA-Dash, which aims to maximize the impact of TSDs during the fine-tuning process, thereby enhancing model performance on targeted tasks. Additionally, based on our exploration of TSD, we focus on an important issue in PEFT: the initialization of LoRA. While some works have pointed out the significance of initialization for LoRA's performance and proposed various strategies, these methods are often empirical and not task-specific. To address this issue, we propose LoRA-Init. Starting from TSD, we identify the directions that require the most adjustment during fine-tuning for downstream tasks. By initializing the matrices in LoRA with these directions, LoRA-Init significantly enhances LoRA's performance. Moreover, we can combine LoRA-Dash and LoRA-Init to create the final version of LoRA based on TSDs, which we refer to as LoRA-TSD. Extensive experiments have conclusively demonstrated the effectiveness of these methods, and in-depth analyses further reveal the underlying mechanisms behind their success.
翻译:大型语言模型在下游任务上展现出令人印象深刻的性能,但在全参数微调时需要消耗大量资源。为缓解此问题,参数高效微调策略(如LoRA)应运而生。本文深入探讨任务特定方向的概念,该方向对于大型模型从预训练状态转向PEFT中的任务特定增强至关重要。我们提出了一个明确定义这些方向的框架,并探索其特性与实际应用挑战。随后,我们引入一种新方法LoRA-Dash,旨在最大化TSDs在微调过程中的影响,从而提升模型在目标任务上的性能。此外,基于对TSD的探索,我们聚焦于PEFT中的一个重要问题:LoRA的初始化。尽管已有研究指出初始化对LoRA性能的重要性并提出了多种策略,但这些方法通常基于经验且非任务特定。为解决此问题,我们提出LoRA-Init。从TSD出发,我们识别出下游任务微调过程中最需要调整的方向。通过用这些方向初始化LoRA中的矩阵,LoRA-Init显著提升了LoRA的性能。进一步地,我们可以将LoRA-Dash与LoRA-Init结合,构建基于TSDs的最终版LoRA,称为LoRA-TSD。大量实验确证了这些方法的有效性,深入分析进一步揭示了其成功背后的机制。