We propose InCA, a lightweight method for transfer learning that cross-attends to any activation layer of a pre-trained model. During training, InCA uses a single forward pass to extract multiple activations, which are passed to external cross-attention adapters, trained anew and combined or selected for downstream tasks. We show that, even when selecting a single top-scoring adapter, InCA achieves performance comparable to full fine-tuning, at a cost comparable to fine-tuning just the last layer. For example, with a cross-attention probe 1.3% the size of a pre-trained ViT-L/16 model, we achieve performance within 0.2% of the full fine-tuning paragon at a computational training cost of 51% of the baseline, on average across 11 downstream classification. Unlike other forms of efficient adaptation, InCA does not require backpropagating through the pre-trained model, thus leaving its execution unaltered at both training and inference. The versatility of InCA is best illustrated in fine-grained tasks, which may require accessing information absent in the last layer but accessible in intermediate layer activations. Since the backbone is fixed, InCA allows parallel ensembling as well as parallel execution of multiple tasks. InCA achieves state-of-the-art performance in the ImageNet-to-Sketch multi-task benchmark.
翻译:我们提出InCA,一种轻量级的迁移学习方法,能对预训练模型的任意激活层进行交叉注意力计算。训练时,InCA通过单次前向传播提取多重激活,这些激活被传入全新训练的交叉注意力适配器,并针对下游任务进行组合或筛选。研究表明,即便仅选择单个得分最高的适配器,InCA也能达到与全参数微调相当的性能,而计算成本仅相当于微调最后一层。例如,使用仅为预训练ViT-L/16模型1.3%规模的交叉注意力探针,我们在11个下游分类任务上平均获得与全参数微调标杆模型差距不超过0.2%的性能,训练计算成本仅为基准方法的51%。不同于其他高效适配方法,InCA无需通过预训练模型进行反向传播,因此在训练和推理阶段均不改变其执行流程。InCA的通用性在细粒度任务中尤为突出——这类任务可能需要获取末层缺失但中间层激活中蕴含的信息。由于主干网络保持固定,InCA支持并行集成以及多任务并行执行。在ImageNet-to-Sketch多任务基准测试中,InCA取得了最先进性能。