Fine-tuning large pre-trained language models on various downstream tasks with whole parameters is prohibitively expensive. Hence, Parameter-efficient fine-tuning has attracted attention that only optimizes a few task-specific parameters with the frozen pre-trained model. In this work, we focus on prefix tuning, which only optimizes continuous prefix vectors (i.e. pseudo tokens) inserted into Transformer layers. Based on the observation that the learned syntax and semantics representation varies a lot at different layers, we argue that the adaptive prefix will be further tailored to each layer than the fixed one, enabling the fine-tuning more effective and efficient. Thus, we propose Adaptive Prefix Tuning (APT) to adjust the prefix in terms of both fine-grained token level and coarse-grained layer level with a gate mechanism. Experiments on the SuperGLUE and NER datasets show the effectiveness of APT. In addition, taking the gate as a probing, we validate the efficiency and effectiveness of the variable prefix.
翻译:微调整个参数的大型预训练语言模型用于各种下游任务成本过高。因此,参数高效微调受到关注,该方法仅优化少量任务特定参数,同时冻结预训练模型。本文聚焦于前缀调优,它仅优化插入到Transformer层中的连续前缀向量(即伪令牌)。基于在不同层学习到的句法和语义表示差异较大的观察,我们提出自适应前缀比固定前缀更能针对各层进行定制,从而使微调更加有效和高效。为此,我们提出了自适应前缀调优(APT),通过门控机制在细粒度的令牌级别和粗粒度的层级别上调整前缀。在SuperGLUE和NER数据集上的实验验证了APT的有效性。此外,通过将门控作为探测工具,我们验证了可变前缀的效率和有效性。