The Parameter-Efficient Fine-Tuning (PEFT) method, which adjusts or introduces fewer trainable parameters to calibrate pre-trained models on downstream tasks, has become a recent research interest. However, existing PEFT methods within the traditional fine-tiuning framework have two main shortcomings: 1) They overlook the explicit association between trainable parameters and downstream task knowledge. 2) They neglect the interaction between the intrinsic task-agnostic knowledge of pre-trained models and the task-specific knowledge in downstream tasks. To address this gap, we propose a novel fine-tuning framework, named GIST, in a plug-and-play manner. Specifically, our framework first introduces a trainable token, called the Gist token, when applying PEFT methods on downstream tasks. This token serves as an aggregator of the task-specific knowledge learned by the PEFT methods and forms an explicit association with downstream knowledge. Furthermore, to facilitate explicit interaction between task-agnostic and task-specific knowledge, we introduce the concept of Knowledge Interaction via a Bidirectional Kullback-Leibler Divergence objective. As a result, PEFT methods within our framework can make the pre-trained model understand downstream tasks more comprehensively by leveraging the knowledge interaction. Extensive experiments demonstrate the universality and scalability of our framework. Notably, on the VTAB-1K benchmark, we employ the Adapter (a prevalent PEFT method) within our GIST framework and achieve a performance boost of 2.25%, with an increase of only 0.8K parameters. The Code will be released.
翻译:参数高效微调(PEFT)方法通过调整或引入少量可训练参数,在下游任务上校准预训练模型,已成为近期研究热点。然而,现有基于传统微调框架的PEFT方法存在两大缺陷:1)忽略了可训练参数与下游任务知识之间的显式关联;2)忽视了预训练模型固有的任务无关知识与下游任务特定知识之间的交互。为解决这一问题,我们提出一种即插即用的新型微调框架,名为GIST。具体而言,该框架首先在基于PEFT方法处理下游任务时引入一个称为Gist令牌的可训练标记,该标记作为PEFT方法所学习任务特定知识的聚合器,并与下游知识形成显式关联。此外,为促进任务无关知识与任务特定知识的显式交互,我们通过双向KL散度目标函数引入知识交互概念。由此,框架内的PEFT方法借助知识交互可使预训练模型更全面地理解下游任务。大量实验证明了本框架的通用性与可扩展性。值得注意的是,在VTAB-1K基准测试中,我们在GIST框架内采用主流PEFT方法Adapter,仅增加0.8K参数即实现2.25%的性能提升。相关代码将公开发布。