Large-scale pre-trained Vision-Language Models (VLMs) have exhibited impressive zero-shot performance and transferability, allowing them to adapt to downstream tasks in a data-efficient manner. However, when only a few labeled samples are available, adapting VLMs to distinguish subtle differences between similar classes in specific downstream tasks remains challenging. In this work, we propose a Simple yet effective Negative Learning approach, SimNL, to more efficiently exploit the task-specific knowledge from few-shot labeled samples. Unlike previous methods that focus on identifying a set of representative positive features defining "what is a {CLASS}", SimNL discovers a complementary set of negative features that define "what is not a {CLASS}", providing additional insights that supplement the positive features to enhance task-specific recognition capability. Further, we identify that current adaptation approaches are particularly vulnerable to potential noise in the few-shot sample set. To mitigate this issue, we introduce a plug-and-play few-shot instance reweighting technique to suppress noisy outliers and amplify clean samples for more stable adaptation. Our extensive experimental results across 15 datasets validate that the proposed SimNL outperforms existing state-of-the-art methods on both few-shot learning and domain generalization tasks while achieving competitive computational efficiency. Code is available at https://github.com/zhangce01/SimNL.
翻译:大规模预训练的视觉语言模型(VLMs)已展现出令人印象深刻的零样本性能和可迁移性,使其能够以数据高效的方式适应下游任务。然而,当仅有少量标注样本可用时,使VLMs适应特定下游任务中相似类别间的细微差异仍然具有挑战性。在本工作中,我们提出了一种简单而有效的负向学习方法SimNL,以更高效地利用小样本标注样本中的任务特定知识。与先前专注于识别一组定义“什么是{类别}”的代表性正向特征的方法不同,SimNL发现了一组互补的负向特征,用于定义“什么不是{类别}”,这些特征提供了补充正向特征的额外见解,从而增强了任务特定的识别能力。此外,我们发现当前的适应方法对小样本集合中潜在的噪声特别敏感。为缓解此问题,我们引入了一种即插即用的小样本实例重加权技术,以抑制噪声异常值并放大干净样本,从而实现更稳定的适应。我们在15个数据集上的广泛实验结果验证了所提出的SimNL在小样本学习和领域泛化任务上均优于现有的最先进方法,同时实现了具有竞争力的计算效率。代码可在https://github.com/zhangce01/SimNL获取。