We propose InCA, a lightweight method for transfer learning that cross-attends to any activation layer of a pre-trained model. During training, InCA uses a single forward pass to extract multiple activations, which are passed to external cross-attention adapters, trained anew and combined or selected for downstream tasks. We show that, even when selecting a single top-scoring adapter, InCA achieves performance comparable to full fine-tuning, at a cost comparable to fine-tuning just the last layer. For example, with a cross-attention probe 1.3% the size of a pre-trained ViT-L/16 model, we achieve performance within 0.2% of the full fine-tuning paragon at 51% training cost of the baseline, on average across 11 downstream classification tasks. Unlike other forms of efficient adaptation, InCA does not require backpropagating through the pre-trained model, thus leaving its execution unaltered at both training and inference. The versatility of InCA is best illustrated in fine-grained tasks, which may require accessing information absent in the last layer but accessible in intermediate layer activations. Since the backbone is fixed, InCA allows parallel ensembling as well as parallel execution of multiple tasks. InCA achieves state-of-the-art performance in the ImageNet-to-Sketch multi-task benchmark.
翻译:我们提出InCA,一种用于迁移学习的轻量级方法,该方法可对预训练模型的任意激活层进行交叉注意力操作。在训练过程中,InCA通过单次前向传播提取多个激活值,这些激活值被传递至外部交叉注意力适配器(这些适配器经过重新训练,并通过组合或选择用于下游任务)。研究表明,即使仅选择单个最高分适配器,InCA仍能达到与全参数微调相当的性能,而其计算成本仅与仅微调最后一层相当。例如,使用大小仅为预训练ViT-L/16模型1.3%的交叉注意力探测,我们在11个下游分类任务上平均实现了与全参数微调基准模型相差0.2%的性能,而训练成本仅为基准模型的51%。与其他高效自适应方法不同,InCA无需在预训练模型中进行反向传播,因此其执行过程在训练和推理阶段均保持不变。InCA的通用性在细粒度任务中得以最佳体现——这些任务可能需要访问最后一层中缺失、但可在中间层激活值中获取的信息。由于主干网络固定,InCA既支持并行集成,也支持多任务并行执行。在ImageNet到Sketch多任务基准测试中,InCA取得了最先进的性能表现。