ExPLoRA：面向领域偏移下视觉Transformer适配的参数高效扩展预训练方法 (ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers under Domain Shifts)

Parameter-efficient fine-tuning (PEFT) techniques such as low-rank adaptation (LoRA) can effectively adapt large pre-trained foundation models to downstream tasks using only a small fraction (0.1%-10%) of the original trainable weights. An under-explored question of PEFT is in extending the pre-training phase without supervised labels; that is, can we adapt a pre-trained foundation model to a new domain via efficient self-supervised pre-training on this domain? In this work, we introduce ExPLoRA, a highly effective technique to improve transfer learning of pre-trained vision transformers (ViTs) under domain shifts. Initializing a ViT with pre-trained weights on large, natural-image datasets such as from DinoV2 or MAE, ExPLoRA continues the unsupervised pre-training objective on a new domain, unfreezing 1-2 pre-trained ViT blocks and tuning all other layers with LoRA. We then fine-tune the resulting model only with LoRA on this new domain for supervised learning. Our experiments demonstrate state-of-the-art results on satellite imagery, even outperforming fully pre-training and fine-tuning ViTs. Using the DinoV2 training objective, we demonstrate up to 8% improvement in linear probing top-1 accuracy on downstream tasks while using <10% of the number of parameters that are used in prior fully-tuned state-of-the-art approaches. Our ablation studies confirm the efficacy of our approach over other baselines such as PEFT. Code is available on the project website: https://samar-khanna.github.io/ExPLoRA/

翻译：参数高效微调（PEFT）技术（例如低秩适应（LoRA））能够仅使用原始可训练权重的一小部分（0.1%-10%）即可将大型预训练基础模型有效适配至下游任务。PEFT中一个尚未充分探索的问题在于如何在没有监督标签的情况下扩展预训练阶段；即，我们能否通过在目标领域上进行高效的自监督预训练，使预训练基础模型适应新领域？本文提出ExPLoRA，一种在领域偏移下提升预训练视觉Transformer（ViT）迁移学习性能的高效技术。该方法基于DinoV2或MAE等大型自然图像数据集预训练权重初始化ViT，在新领域上延续无监督预训练目标，解冻1-2个预训练ViT模块，并使用LoRA调整其余所有层。随后，我们仅使用LoRA在新领域上对所得模型进行监督学习的微调。实验结果表明，在卫星影像任务上我们的方法取得了最先进的性能，甚至优于完整预训练与微调的ViT模型。采用DinoV2训练目标时，我们在下游任务的线性探测top-1准确率上实现了最高8%的提升，且所用参数量不足先前全参数调优最优方法所需参数量的10%。消融研究证实了本方法相对于PEFT等其他基线方案的有效性。代码已发布于项目网站：https://samar-khanna.github.io/ExPLoRA/