Parameter-Efficient Fine-Tuning (PEFT) methods have gained significant popularity for adapting pre-trained Large Language Models (LLMs) to downstream tasks, primarily due to their potential to significantly reduce memory and computational overheads. However, a common limitation in most PEFT approaches is their application of a uniform architectural design across all layers. This uniformity involves identical trainable modules and ignores the varying importance of each layer, leading to sub-optimal fine-tuning results. To overcome the above limitation and obtain better performance, we develop a novel approach, Importance-aware Sparse Tuning (IST), to fully utilize the inherent sparsity and select the most important subset of full layers with effective layer-wise importance scoring. The proposed IST is a versatile and plug-and-play technique compatible with various PEFT methods that operate on a per-layer basis. By leveraging the estimated importance scores, IST dynamically updates these selected layers in PEFT modules, leading to reduced memory demands. We further provide theoretical proof of convergence and empirical evidence of superior performance to demonstrate the advantages of IST over uniform updating strategies. Extensive experiments on a range of LLMs, PEFTs, and downstream tasks substantiate the effectiveness of our proposed method, showcasing IST's capacity to enhance existing layer-based PEFT methods. Our code is available at https://github.com/Kaiseem/IST.
翻译:参数高效微调(PEFT)方法因其能显著降低内存与计算开销的潜力,在将预训练大语言模型(LLMs)适配至下游任务时获得了广泛关注。然而,现有大多数PEFT方法普遍存在一个局限:其在所有网络层中采用统一的架构设计。这种均质化策略使用完全相同的可训练模块,忽视了各层的重要性差异,导致微调效果未能达到最优。为突破上述限制并获得更优性能,我们提出了一种新颖方法——重要性感知稀疏调优(IST),该方法通过有效的层间重要性评分,充分利用模型固有的稀疏性,筛选出全量层中最关键的子集。所提出的IST是一种通用即插即用技术,可与各类基于逐层操作的PEFT方法兼容。借助估计的重要性评分,IST动态更新PEFT模块中选定的层,从而降低内存需求。我们进一步提供了收敛性理论证明与优越性能的实证依据,以展示IST相较于均匀更新策略的优势。在多种LLMs、PEFT方法及下游任务上的大量实验验证了本方法的有效性,证明了IST能够增强现有基于层的PEFT方法。代码已开源:https://github.com/Kaiseem/IST。