Layer-wise Importance Matters: Less Memory for Better Performance in Parameter-efficient Fine-tuning of Large Language Models

Parameter-Efficient Fine-Tuning (PEFT) methods have gained significant popularity for adapting pre-trained Large Language Models (LLMs) to downstream tasks, primarily due to their potential to significantly reduce memory and computational overheads. However, a common limitation in most PEFT approaches is their application of a uniform architectural design across all layers. This uniformity involves identical trainable modules and ignores the varying importance of each layer, leading to sub-optimal fine-tuning results. To overcome the above limitation and obtain better performance, we develop a novel approach, Importance-aware Sparse Tuning (IST), to fully utilize the inherent sparsity and select the most important subset of full layers with effective layer-wise importance scoring. The proposed IST is a versatile and plug-and-play technique compatible with various PEFT methods that operate on a per-layer basis. By leveraging the estimated importance scores, IST dynamically updates these selected layers in PEFT modules, leading to reduced memory demands. We further provide theoretical proof of convergence and empirical evidence of superior performance to demonstrate the advantages of IST over uniform updating strategies. Extensive experiments on a range of LLMs, PEFTs, and downstream tasks substantiate the effectiveness of our proposed method, showcasing IST's capacity to enhance existing layer-based PEFT methods. Our code is available at https://github.com/Kaiseem/IST.

翻译：参数高效微调（PEFT）方法因其能显著降低内存与计算开销的潜力，在将预训练大语言模型（LLMs）适配至下游任务时获得了广泛关注。然而，现有大多数PEFT方法普遍存在一个局限：其在所有网络层中采用统一的架构设计。这种均质化策略使用完全相同的可训练模块，忽视了各层的重要性差异，导致微调效果未能达到最优。为突破上述限制并获得更优性能，我们提出了一种新颖方法——重要性感知稀疏调优（IST），该方法通过有效的层间重要性评分，充分利用模型固有的稀疏性，筛选出全量层中最关键的子集。所提出的IST是一种通用即插即用技术，可与各类基于逐层操作的PEFT方法兼容。借助估计的重要性评分，IST动态更新PEFT模块中选定的层，从而降低内存需求。我们进一步提供了收敛性理论证明与优越性能的实证依据，以展示IST相较于均匀更新策略的优势。在多种LLMs、PEFT方法及下游任务上的大量实验验证了本方法的有效性，证明了IST能够增强现有基于层的PEFT方法。代码已开源：https://github.com/Kaiseem/IST。

相关内容

IST

关注 0

《信息与软件技术》是一本国际档案期刊，主要关注有助于改进软件开发实践的研究和经验。该杂志的范围包括更好地设计软件和管理其开发的方法和技术。提交审查的文章应该有一个明确的软件工程的组成部分，或者说明如何改进软件开发的工程和管理。官网地址： http://dblp.uni-trier.de/db/journals/infsof/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日