Large Language Models (LLMs) such as ChatGPT demonstrate strong few-shot adaptability without requiring fine-tuning, positioning them ideal for data-limited and real-time applications. However, this adaptability has not yet been replicated in current Visual Foundation Models (VFMs), which require explicit fine-tuning with sufficient tuning data. Besides, the pretraining-finetuning paradigm has led to the surge of numerous task-specific modular components, such as Low-Rank Adaptation (LoRA). For the first time, we explore the potential of reusing diverse pre-tuned LoRAs without accessing their original training data, to achieve tuning-free few-shot adaptation in VFMs. Our framework, LoRA Recycle, distills a meta-LoRA from diverse pre-tuned LoRAs with a meta-learning objective, using surrogate data generated inversely from pre-tuned LoRAs themselves. The VFM, once equipped with the meta-LoRA, is empowered to solve new few-shot tasks in a single forward pass, akin to the in-context learning of LLMs. Additionally, we incorporate a double-efficient mechanism tailored to our framework, significantly accelerating the meta-training process while maintaining or even improving performance. Extensive experiments across various few-shot classification benchmarks across both in- and cross-domain scenarios demonstrate the superiority of our framework.
翻译:诸如ChatGPT之类的大型语言模型展现出强大的少样本适应能力,且无需进行微调,这使其成为数据有限和实时应用的理想选择。然而,当前的视觉基础模型尚未复现这种适应性,它们仍需要足够的调优数据进行显式微调。此外,预训练-微调范式导致了大量任务特定模块化组件(如低秩适应)的涌现。我们首次探索了在无需访问原始训练数据的情况下,复用多种预调优LoRA以实现视觉基础模型中无需调优的少样本适应的潜力。我们的框架LoRA Recycle,通过元学习目标,从多种预调优LoRA中蒸馏出一个元LoRA,所使用的替代数据由预调优LoRA本身逆向生成。一旦视觉基础模型配备了该元LoRA,便能够通过单次前向传播解决新的少样本任务,类似于大型语言模型的上下文学习。此外,我们为框架引入了一种双重高效机制,显著加速了元训练过程,同时保持甚至提升了性能。在领域内和跨领域场景下的多种少样本分类基准测试中进行的广泛实验证明了我们框架的优越性。