Vision foundation models exhibit impressive power, benefiting from the extremely large model capacity and broad training data. However, in practice, downstream scenarios may only support a small model due to the limited computational resources or efficiency considerations. Moreover, the data used for pretraining foundation models are usually invisible and very different from the target data of downstream tasks. This brings a critical challenge for the real-world application of foundation models: one has to transfer the knowledge of a foundation model to the downstream task that has a quite different architecture with only downstream target data. Existing transfer learning or knowledge distillation methods depend on either the same model structure or finetuning of the foundation model. Thus, naively introducing these methods can be either infeasible or very inefficient. To address this, we propose a Task-Driven Model Reprogramming (TDMR) framework. Specifically, we reprogram the foundation model to project the knowledge into a proxy space, which alleviates the adverse effect of task mismatch and domain inconsistency. Then, we reprogram the target model via progressive distillation from the proxy space to efficiently learn the knowledge from the reprogrammed foundation model. TDMR is compatible with different pre-trained model types (CNN, transformer or their mix) and limited target data, and promotes the wide applications of vision foundation models to downstream tasks in a cost-effective manner. Extensive experiments on different downstream classification tasks and target model structures demonstrate the effectiveness of our methods with both CNNs and transformer foundation models.
翻译:视觉基础模型凭借其庞大的模型容量与广泛的训练数据展现出卓越性能。然而在实际应用中,下游场景可能因计算资源有限或效率考量而仅支持小型模型。此外,用于预训练基础模型的数据通常不可见,且与下游任务的目标数据差异显著。这为基础模型的实际应用带来了关键挑战:必须将基础模型的知识迁移至架构差异较大且仅能获取下游目标数据的任务中。现有迁移学习或知识蒸馏方法依赖于相同的模型结构或对基础模型进行微调,因此直接引入这些方法要么不可行,要么效率极低。为解决上述问题,我们提出任务驱动模型重编程(TDMR)框架。具体而言,我们对基础模型进行重编程,将其知识投影至代理空间,从而缓解任务不匹配与领域不一致带来的负面影响。接着,我们通过渐进式蒸馏将代理空间的知识迁移至目标模型,使其高效学习重编程后基础模型的知识。TDMR兼容不同预训练模型类型(CNN、Transformer或其混合架构)及有限的目标数据,以低成本高效益的方式推动视觉基础模型在下游任务中的广泛应用。针对不同下游分类任务与目标模型结构的广泛实验表明,我们的方法在CNN与Transformer基础模型上均具有有效性。