任务特定方向：参数高效微调中的定义、探索与利用 (Task-Specific Directions: Definition, Exploration, and Utilization in Parameter Efficient Fine-Tuning)

Large language models demonstrate impressive performance on downstream tasks, yet they require extensive resource consumption when fully fine-tuning all parameters. To mitigate this, Parameter Efficient Fine-Tuning (PEFT) strategies, such as LoRA, have been developed. In this paper, we delve into the concept of task-specific directions (TSDs), which are critical for transitioning large models from pretrained states to task-specific enhancements in PEFT. We propose a framework to clearly define these directions and explore their properties and practical utilization challenges. We then introduce a novel approach, LoRA-Dash, which aims to maximize the impact of TSDs during the fine-tuning process, thereby enhancing model performance on targeted tasks. Additionally, based on our exploration of TSD, we focus on an important issue in PEFT: the initialization of LoRA. While some works have pointed out the significance of initialization for LoRA's performance and proposed various strategies, these methods are often empirical and not task-specific. To address this issue, we propose LoRA-Init. Starting from TSD, we identify the directions that require the most adjustment during fine-tuning for downstream tasks. By initializing the matrices in LoRA with these directions, LoRA-Init significantly enhances LoRA's performance. Moreover, we can combine LoRA-Dash and LoRA-Init to create the final version of LoRA based on TSDs, which we refer to as LoRA-TSD. Extensive experiments have conclusively demonstrated the effectiveness of these methods, and in-depth analyses further reveal the underlying mechanisms behind their success.

翻译：大型语言模型在下游任务上展现出令人印象深刻的性能，但在全参数微调时需要消耗大量资源。为缓解此问题，参数高效微调策略（如LoRA）应运而生。本文深入探讨任务特定方向的概念，该方向对于大型模型从预训练状态转向PEFT中的任务特定增强至关重要。我们提出了一个明确定义这些方向的框架，并探索其特性与实际应用挑战。随后，我们引入一种新方法LoRA-Dash，旨在最大化TSDs在微调过程中的影响，从而提升模型在目标任务上的性能。此外，基于对TSD的探索，我们聚焦于PEFT中的一个重要问题：LoRA的初始化。尽管已有研究指出初始化对LoRA性能的重要性并提出了多种策略，但这些方法通常基于经验且非任务特定。为解决此问题，我们提出LoRA-Init。从TSD出发，我们识别出下游任务微调过程中最需要调整的方向。通过用这些方向初始化LoRA中的矩阵，LoRA-Init显著提升了LoRA的性能。进一步地，我们可以将LoRA-Dash与LoRA-Init结合，构建基于TSDs的最终版LoRA，称为LoRA-TSD。大量实验确证了这些方法的有效性，深入分析进一步揭示了其成功背后的机制。