Noisy Test-Time Adaptation in Vision-Language Models

Test-time adaptation (TTA) aims to address distribution shifts between source and target data by relying solely on target data during testing. In open-world scenarios, models often encounter noisy samples, i.e., samples outside the in-distribution (ID) label space. Leveraging the zero-shot capability of pre-trained vision-language models (VLMs), this paper introduces Zero-Shot Noisy TTA (ZS-NTTA), focusing on adapting the model to target data with noisy samples during test-time in a zero-shot manner. We find existing TTA methods underperform under ZS-NTTA, often lagging behind even the frozen model. We conduct comprehensive experiments to analyze this phenomenon, revealing that the negative impact of unfiltered noisy data outweighs the benefits of clean data during model updating. Also, adapting a classifier for ID classification and noise detection hampers both sub-tasks. Built on this, we propose a framework that decouples the classifier and detector, focusing on developing an individual detector while keeping the classifier frozen. Technically, we introduce the Adaptive Noise Detector (AdaND), which utilizes the frozen model's outputs as pseudo-labels to train a noise detector. To handle clean data streams, we further inject Gaussian noise during adaptation, preventing the detector from misclassifying clean samples as noisy. Beyond the ZS-NTTA, AdaND can also improve the zero-shot out-of-distribution (ZS-OOD) detection ability of VLMs. Experiments show that AdaND outperforms in both ZS-NTTA and ZS-OOD detection. On ImageNet, AdaND achieves a notable improvement of $8.32\%$ in harmonic mean accuracy ($\text{Acc}_\text{H}$) for ZS-NTTA and $9.40\%$ in FPR95 for ZS-OOD detection, compared to SOTA methods. Importantly, AdaND is computationally efficient and comparable to the model-frozen method. The code is publicly available at: https://github.com/tmlr-group/ZS-NTTA.

翻译：测试时适应（TTA）旨在通过仅依赖测试期间的目标数据来解决源数据和目标数据之间的分布偏移。在开放世界场景中，模型经常会遇到噪声样本，即位于分布内（ID）标签空间之外的样本。本文利用预训练视觉语言模型（VLM）的零样本能力，提出了零样本噪声测试时适应（ZS-NTTA），专注于在测试时以零样本方式使模型适应包含噪声样本的目标数据。我们发现现有的TTA方法在ZS-NTTA设置下表现不佳，甚至常常落后于冻结的模型。我们进行了全面的实验来分析这一现象，揭示了在模型更新过程中，未经过滤的噪声数据的负面影响超过了干净数据带来的益处。同时，为ID分类和噪声检测而调整分类器会阻碍这两个子任务。基于此，我们提出了一个解耦分类器和检测器的框架，专注于开发独立的检测器，同时保持分类器冻结。技术上，我们引入了自适应噪声检测器（AdaND），它利用冻结模型的输出作为伪标签来训练一个噪声检测器。为了处理干净数据流，我们在适应过程中进一步注入高斯噪声，以防止检测器将干净样本误分类为噪声。除了ZS-NTTA，AdaND还可以提升VLM的零样本分布外（ZS-OOD）检测能力。实验表明，AdaND在ZS-NTTA和ZS-OOD检测方面均表现优异。在ImageNet上，与最先进的方法相比，AdaND在ZS-NTTA的调和平均准确率（$\text{Acc}_\text{H}$）上实现了$8.32\%$的显著提升，在ZS-OOD检测的FPR95指标上提升了$9.40\%$。重要的是，AdaND计算高效，与模型冻结方法相当。代码公开于：https://github.com/tmlr-group/ZS-NTTA。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日