This work breaks through the Base-New Tradeoff (BNT)dilemma in prompt tuning, i.e., the better the tuned model generalizes to the base (or target) task, the worse it generalizes to new tasks, and vice versa. Specifically, through an in-depth analysis of the learned features of the base and new tasks, we observe that the BNT stems from a channel bias issue, i.e., the vast majority of feature channels are occupied by base-specific knowledge, resulting in the collapse of taskshared knowledge important to new tasks. To address this, we propose the Decoupled Prompt Tuning (DePT) framework, which decouples base-specific knowledge from feature channels into an isolated feature space during prompt tuning, so as to maximally preserve task-shared knowledge in the original feature space for achieving better zero-shot generalization on new tasks. Importantly, our DePT is orthogonal to existing prompt tuning methods, hence it can improve all of them. Extensive experiments on 11 datasets show the strong flexibility and effectiveness of DePT. Our code and pretrained models are available at https://github.com/Koorye/DePT.
翻译:本工作突破了提示调优中的基-新任务权衡困境,即调优模型对基任务(或目标任务)的泛化能力越强,其对新任务的泛化能力就越弱,反之亦然。具体而言,通过对基任务与新任务学习特征的深入分析,我们发现该权衡困境源于通道偏向问题——绝大多数特征通道被基任务特有知识占据,导致对新任务至关重要的任务共享知识坍缩。针对这一问题,我们提出解耦式提示调优框架,在提示调优过程中将基任务特有知识从特征通道中分离至独立特征空间,从而最大限度保留原始特征空间中的任务共享知识,以提升对新任务的零样本泛化能力。重要的是,DePT与现有提示调优方法正交,因此可增强所有此类方法。在11个数据集上的广泛实验证明了DePT的强大灵活性与有效性。我们的代码与预训练模型已开源至https://github.com/Koorye/DePT。