Prompt Tuning (PT) enables the adaptation of Pre-trained Large Language Models (PLMs) to downstream tasks by optimizing a small amount of soft virtual tokens, which are prepended to the input token embeddings. Recently, Decomposed Prompt Tuning (DePT) has demonstrated superior adaptation capabilities by decomposing the soft prompt into a shorter soft prompt and a pair of low-rank matrices. The product of the pair of low-rank matrices is added to the input token embeddings to offset them. Additionally, DePT achieves faster inference compared to PT due to the shorter soft prompt. However, in this paper, we find that the position-based token embedding offsets of DePT restrict its ability to generalize across diverse model inputs, and that the shared embedding offsets across many token embeddings result in sub-optimization. To tackle these issues, we introduce Adaptive Decomposed Prompt Tuning (ADePT), which is composed of a short soft prompt and a shallow token-shared feed-forward neural network. ADePT utilizes the token-shared feed-forward neural network to learn the embedding offsets for each token, enabling adaptive embedding offsets that vary according to the model input and better optimization of token embedding offsets. This enables ADePT to achieve superior adaptation performance without requiring more inference time or additional trainable parameters compared to vanilla PT and its variants. In comprehensive experiments across 23 natural language processing tasks and 4 typical PLMs of different scales, ADePT consistently surpasses the other leading parameter-efficient fine-tuning methods, and even outperforms the full fine-tuning in certain scenarios. We also provide a theoretical analysis towards ADePT. Code is available at https://github.com/HungerPWAY/ADePT.
翻译:提示调优(Prompt Tuning, PT)通过优化少量软虚拟令牌(这些令牌被预置到输入令牌嵌入之前),使得预训练大语言模型(Pre-trained Large Language Models, PLMs)能够适应下游任务。最近,分解提示调优(Decomposed Prompt Tuning, DePT)通过将软提示分解为一个较短的软提示和一对低秩矩阵,展示了更优的适应能力。这对低秩矩阵的乘积被添加到输入令牌嵌入中以对其进行偏移。此外,由于使用了较短的软提示,DePT相比PT实现了更快的推理速度。然而,本文发现,DePT基于位置的令牌嵌入偏移限制了其在不同模型输入间的泛化能力,并且许多令牌嵌入共享的偏移导致了次优优化。为解决这些问题,我们提出了自适应分解提示调优(Adaptive Decomposed Prompt Tuning, ADePT),它由一个短软提示和一个浅层的令牌共享前馈神经网络组成。ADePT利用令牌共享前馈神经网络为每个令牌学习嵌入偏移,从而实现随模型输入变化的自适应嵌入偏移,并更好地优化令牌嵌入偏移。这使得ADePT能够在与原始PT及其变体相比,不增加推理时间或额外可训练参数的情况下,实现更优的适应性能。在涵盖23个自然语言处理任务和4个不同规模的典型PLMs的综合实验中,ADePT始终优于其他领先的参数高效微调方法,甚至在某些场景下超越了全参数微调。我们还对ADePT进行了理论分析。代码发布于 https://github.com/HungerPWAY/ADePT。