Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models

With the emergence of pre-trained vision-language models like CLIP, how to adapt them to various downstream classification tasks has garnered significant attention in recent research. The adaptation strategies can be typically categorized into three paradigms: zero-shot adaptation, few-shot adaptation, and the recently-proposed training-free few-shot adaptation. Most existing approaches are tailored for a specific setting and can only cater to one or two of these paradigms. In this paper, we introduce a versatile adaptation approach that can effectively work under all three settings. Specifically, we propose the dual memory networks that comprise dynamic and static memory components. The static memory caches training data knowledge, enabling training-free few-shot adaptation, while the dynamic memory preserves historical test features online during the testing process, allowing for the exploration of additional data insights beyond the training set. This novel capability enhances model performance in the few-shot setting and enables model usability in the absence of training data. The two memory networks employ the same flexible memory interactive strategy, which can operate in a training-free mode and can be further enhanced by incorporating learnable projection layers. Our approach is tested across 11 datasets under the three task settings. Remarkably, in the zero-shot scenario, it outperforms existing methods by over 3\% and even shows superior results against methods utilizing external training data. Additionally, our method exhibits robust performance against natural distribution shifts. Codes are available at \url{https://github.com/YBZh/DMN}.

翻译：随着预训练视觉-语言模型（如CLIP）的兴起，如何将其适配至各类下游分类任务已成为近年研究的热点。当前适配策略主要分为三类范式：零样本适配、小样本适配以及近期提出的免训练小样本适配。现有方法大多针对特定场景设计，仅能支持其中一至两种范式。本文提出一种通用适配方法，可在上述三类场景中有效运行。具体而言，我们构建了包含动态记忆与静态记忆组件的双记忆网络：静态记忆缓存训练数据知识，实现免训练小样本适配；动态记忆则在测试过程中在线保留历史测试特征，进而挖掘训练集之外的数据洞察。这一新能力不仅增强了模型在小样本场景下的性能，也使其在缺乏训练数据时仍具可用性。两个记忆网络采用相同的灵活记忆交互策略，既可运行于免训练模式，也可通过引入可学习投影层进一步增强。我们在11个数据集上对三种任务场景进行了全面测试。值得注意的是，在零样本场景下，本方法相较现有方法性能提升超过3%，甚至优于使用外部训练数据的方法。此外，本方法在自然分布偏移场景下展现出稳健性能。代码已开源至 \url{https://github.com/YBZh/DMN}。

相关内容

小样本学习

关注 216

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日