Meta-Adapter: An Online Few-shot Learner for Vision-Language Model

The contrastive vision-language pre-training, known as CLIP, demonstrates remarkable potential in perceiving open-world visual concepts, enabling effective zero-shot image recognition. Nevertheless, few-shot learning methods based on CLIP typically require offline fine-tuning of the parameters on few-shot samples, resulting in longer inference time and the risk of over-fitting in certain domains. To tackle these challenges, we propose the Meta-Adapter, a lightweight residual-style adapter, to refine the CLIP features guided by the few-shot samples in an online manner. With a few training samples, our method can enable effective few-shot learning capabilities and generalize to unseen data or tasks without additional fine-tuning, achieving competitive performance and high efficiency. Without bells and whistles, our approach outperforms the state-of-the-art online few-shot learning method by an average of 3.6\% on eight image classification datasets with higher inference speed. Furthermore, our model is simple and flexible, serving as a plug-and-play module directly applicable to downstream tasks. Without further fine-tuning, Meta-Adapter obtains notable performance improvements in open-vocabulary object detection and segmentation tasks.

翻译：基于对比学习的视觉语言预训练模型（CLIP）在感知开放世界视觉概念方面展现出显著潜力，可有效实现零样本图像识别。然而，基于CLIP的少样本学习方法通常需要在少样本样本上对参数进行离线微调，导致推理时间延长且在某些领域存在过拟合风险。针对这些挑战，我们提出Meta-Adapter——一种轻量级残差式适配器，通过少样本样本以在线方式精炼CLIP特征。仅需少量训练样本，本方法即可实现有效的少样本学习能力，并可直接泛化至未见数据或任务而无需额外微调，兼具竞争性性能与高效性。无需复杂设计，本方法在八个图像分类数据集上以更快的推理速度，平均超越现有最先进的在线少样本学习方法3.6%。此外，本模型简洁灵活，可作为即插即用模块直接应用于下游任务。无需进一步微调，Meta-Adapter在开放词汇目标检测与分割任务中均取得显著性能提升。

相关内容

小样本学习

关注 216

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日