Recent works on parameter-efficient transfer learning (PETL) show the potential to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters. However, since they usually insert new structures into the pre-trained model, entire intermediate features of that model are changed and thus need to be stored to be involved in back-propagation, resulting in memory-heavy training. We solve this problem from a novel disentangled perspective, i.e., dividing PETL into two aspects: task-specific learning and pre-trained knowledge utilization. Specifically, we synthesize the task-specific query with a learnable and lightweight module, which is independent of the pre-trained model. The synthesized query equipped with task-specific knowledge serves to extract the useful features for downstream tasks from the intermediate representations of the pre-trained model in a query-only manner. Built upon these features, a customized classification head is proposed to make the prediction for the input sample. lightweight architecture and avoids the use of heavy intermediate features for running gradient descent, it demonstrates limited memory usage in training. Extensive experiments manifest that our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
翻译:参数高效迁移学习(PETL)的最新研究表明,仅需少量可学习参数即可将预训练的视觉Transformer适配至下游识别任务。然而,由于这类方法通常需在预训练模型中插入新结构,导致模型的全部中间特征均被改变,因而在反向传播过程中需要存储这些特征,造成训练时内存占用过高。本文从解耦的新视角解决该问题,即将PETL分解为任务特定学习与预训练知识利用两个层面。具体而言,我们通过独立于预训练模型的可学习轻量模块合成任务特定查询。这种携带任务知识的合成查询以纯查询方式,从预训练模型的中间表示中提取对下游任务有用的特征。基于这些特征,我们提出定制化的分类头来对输入样本进行预测。得益于其轻量架构以及避免使用重型中间特征进行梯度下降,本方法在训练中展现出有限的内存占用。大量实验表明,我们的方法在内存约束下实现了最先进的性能,证明了其在现实场景中的适用性。