Exploiting Category Names for Few-Shot Classification with Vision-Language Models - 专知论文

会员服务 ·

0

视觉语言模型 · 少样本分类 · 类别 · 样本 · 多视觉 ·

2023 年 4 月 18 日

Exploiting Category Names for Few-Shot Classification with Vision-Language Models

翻译：利用类别名称提升视觉-语言模型的少样本分类性能

Taihong Xiao,Zirui Wang,Liangliang Cao,Jiahui Yu,Shengyang Dai,Ming-Hsuan Yang

Vision-language foundation models pretrained on large-scale data provide a powerful tool for many visual understanding tasks. Notably, many vision-language models build two encoders (visual and textual) that can map two modalities into the same embedding space. As a result, the learned representations achieve good zero-shot performance on tasks like image classification. However, when there are only a few examples per category, the potential of large vision-language models is often underperformed, mainly due to the gap between a large number of parameters and a relatively small amount of training data. This paper shows that we can significantly improve the performance of few-shot classification by using the category names to initialize the classification head. With the proposed category name initialization method, our model obtains the state-of-the-art performance on a number of few-shot image classification benchmarks (e.g., 87.37% on ImageNet and 96.08% on Stanford Cars, both using five-shot learning).

翻译：视觉-语言基础模型在大规模数据上的预训练为许多视觉理解任务提供了强大工具。值得注意的是，许多视觉-语言模型构建了双编码器（视觉编码器和文本编码器），能够将两种模态映射到同一嵌入空间。因此，学习到的表征在图像分类等任务上展现出良好的零样本性能。然而，当每个类别仅有少量样本时，大型视觉-语言模型的潜力往往未能充分发挥，这主要归因于大量参数与相对较少的训练数据之间的差距。本文表明，通过使用类别名称初始化分类头，我们可以显著提升少样本分类的性能。采用所提出的类别名称初始化方法，我们的模型在多个少样本图像分类基准上取得了最先进的性能（例如，在ImageNet上达到87.37%，在Stanford Cars上达到96.08%，均采用五样本学习）。

0

相关内容

视觉语言模型

视觉语言模型

【ACL2022教程】有限文本数据学习，Learning with Limited Text Data

【ACL2022教程】有限文本数据学习，Learning with Limited Text Data

专知会员服务

29+阅读 · 2022年5月22日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

128+阅读 · 2022年4月21日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【微软亚研】预训练文本表示作为元学习，Pre-training Text Representations

【微软亚研】预训练文本表示作为元学习，Pre-training Text Representations

专知会员服务

40+阅读 · 2020年4月17日

【ICML2020投稿论文】用于半监督图像分类的CowMask，Milking CowMask for Semi-Supervised Image Classification

【ICML2020投稿论文】用于半监督图像分类的CowMask，Milking CowMask for Semi-Supervised Image Classification

专知会员服务

29+阅读 · 2020年3月27日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【CVPR2020-UBC】改进小样本学习视觉分类，Few-Shot Visual Classification

【CVPR2020-UBC】改进小样本学习视觉分类，Few-Shot Visual Classification

专知会员服务

68+阅读 · 2020年2月25日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【Google论文强烈推荐】ALBERT:基于精简BERT的自我监督学习的语言表示，ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations

【Google论文强烈推荐】ALBERT:基于精简BERT的自我监督学习的语言表示，ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations

专知会员服务

24+阅读 · 2019年12月21日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知

16+阅读 · 2020年5月31日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

22篇论文！增量学习/终生学习论文资源列表

22篇论文！增量学习/终生学习论文资源列表

专知

32+阅读 · 2018年12月27日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

基于四元数的彩色视频去噪方法

国家自然科学基金

0+阅读 · 2012年12月31日

基于Grouplet变换的航空构件断口图像识别新方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

计算机辅助体训中基于形变度与运动分解的运动员3-D形态与运动信息识别

国家自然科学基金

0+阅读 · 2012年12月31日

弱监督条件下RGB-D时序图像的语义分割模型与迁移学习算法

国家自然科学基金

0+阅读 · 2012年12月31日

通用Java程序到实时Java程序的对象自动分类和转化方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

多模式合成孔径激光雷达成像原理与实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

多目标优化Pareto支配性的模式识别方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

模式识别中多元数据的对偶射影空间表示和可视化问题

国家自然科学基金

0+阅读 · 2008年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning

Arxiv

0+阅读 · 2023年6月5日

Prompt to be Consistent is Better than Self-Consistent? Few-Shot and Zero-Shot Fact Verification with Pre-trained Language Models

Arxiv

0+阅读 · 2023年6月5日

MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models

Arxiv

0+阅读 · 2023年6月2日

Is Model Attention Aligned with Human Attention? An Empirical Study on Large Language Models for Code Generation

Arxiv

0+阅读 · 2023年6月2日

Vocabulary-free Image Classification

Arxiv

0+阅读 · 2023年6月1日

VLP: A Survey on Vision-Language Pre-training

VLP: A Survey on Vision-Language Pre-training

Arxiv

11+阅读 · 2022年2月21日

Few-Shot Graph Classification with Model Agnostic Meta-Learning

Arxiv

23+阅读 · 2020年3月18日

Self-Supervised Learning For Few-Shot Image Classification

Self-Supervised Learning For Few-Shot Image Classification

Arxiv

19+阅读 · 2019年11月14日

How to Fine-Tune BERT for Text Classification?

How to Fine-Tune BERT for Text Classification?

Arxiv

13+阅读 · 2019年5月14日

Graph Convolutional Networks for Text Classification

Arxiv

12+阅读 · 2018年9月15日

VIP会员

文章信息

相关主题

视觉语言模型

少样本分类

最新内容

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

专知会员服务

6+阅读 · 6月25日

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

专知会员服务

5+阅读 · 6月25日

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

专知会员服务

7+阅读 · 6月25日

网状网络及其在军事领域的运用

网状网络及其在军事领域的运用

专知会员服务

7+阅读 · 6月25日

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

专知会员服务

7+阅读 · 6月25日

无美国参与的欧洲战争方式（万字长文）

无美国参与的欧洲战争方式（万字长文）

专知会员服务

8+阅读 · 6月25日

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

专知会员服务

9+阅读 · 6月25日

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

专知会员服务

8+阅读 · 6月25日

《国防领域敏感性分析白皮书》

《国防领域敏感性分析白皮书》

专知会员服务

8+阅读 · 6月25日

综述 | 从问答到任务完成：Agent系统与Harness设计

综述 | 从问答到任务完成：Agent系统与Harness设计

专知会员服务

9+阅读 · 6月24日

Agentic RL：框架、实践与长程智能体训练

Agentic RL：框架、实践与长程智能体训练

专知会员服务

10+阅读 · 6月24日

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

专知会员服务

11+阅读 · 6月24日

重新思考无人机时代的生存能力

重新思考无人机时代的生存能力

专知会员服务

10+阅读 · 6月24日

装甲突击旅：现代战争思考、战斗与组织

装甲突击旅：现代战争思考、战斗与组织

专知会员服务

7+阅读 · 6月24日

在人工智能加速决策环境中拓展OODA循环

在人工智能加速决策环境中拓展OODA循环

专知会员服务

10+阅读 · 6月24日

相关VIP内容

【ACL2022教程】有限文本数据学习，Learning with Limited Text Data

【ACL2022教程】有限文本数据学习，Learning with Limited Text Data

专知会员服务

29+阅读 · 2022年5月22日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

128+阅读 · 2022年4月21日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【微软亚研】预训练文本表示作为元学习，Pre-training Text Representations

【微软亚研】预训练文本表示作为元学习，Pre-training Text Representations

专知会员服务

40+阅读 · 2020年4月17日

【ICML2020投稿论文】用于半监督图像分类的CowMask，Milking CowMask for Semi-Supervised Image Classification

【ICML2020投稿论文】用于半监督图像分类的CowMask，Milking CowMask for Semi-Supervised Image Classification

专知会员服务

29+阅读 · 2020年3月27日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【CVPR2020-UBC】改进小样本学习视觉分类，Few-Shot Visual Classification

【CVPR2020-UBC】改进小样本学习视觉分类，Few-Shot Visual Classification

专知会员服务

68+阅读 · 2020年2月25日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【Google论文强烈推荐】ALBERT:基于精简BERT的自我监督学习的语言表示，ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations

【Google论文强烈推荐】ALBERT:基于精简BERT的自我监督学习的语言表示，ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations

专知会员服务

24+阅读 · 2019年12月21日

热门VIP内容

开通专知VIP会员享更多权益服务

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

网状网络及其在军事领域的运用

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

相关资讯

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知

16+阅读 · 2020年5月31日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

22篇论文！增量学习/终生学习论文资源列表

22篇论文！增量学习/终生学习论文资源列表

专知

32+阅读 · 2018年12月27日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning

Arxiv

0+阅读 · 2023年6月5日

Prompt to be Consistent is Better than Self-Consistent? Few-Shot and Zero-Shot Fact Verification with Pre-trained Language Models

Arxiv

0+阅读 · 2023年6月5日

MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models

Arxiv

0+阅读 · 2023年6月2日

Is Model Attention Aligned with Human Attention? An Empirical Study on Large Language Models for Code Generation

Arxiv

0+阅读 · 2023年6月2日

Vocabulary-free Image Classification

Arxiv

0+阅读 · 2023年6月1日

VLP: A Survey on Vision-Language Pre-training

VLP: A Survey on Vision-Language Pre-training

Arxiv

11+阅读 · 2022年2月21日

Few-Shot Graph Classification with Model Agnostic Meta-Learning

Arxiv

23+阅读 · 2020年3月18日

Self-Supervised Learning For Few-Shot Image Classification

Self-Supervised Learning For Few-Shot Image Classification

Arxiv

19+阅读 · 2019年11月14日

How to Fine-Tune BERT for Text Classification?

How to Fine-Tune BERT for Text Classification?

Arxiv

13+阅读 · 2019年5月14日

Graph Convolutional Networks for Text Classification

Arxiv

12+阅读 · 2018年9月15日

相关基金

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

基于四元数的彩色视频去噪方法

国家自然科学基金

0+阅读 · 2012年12月31日

基于Grouplet变换的航空构件断口图像识别新方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

计算机辅助体训中基于形变度与运动分解的运动员3-D形态与运动信息识别

国家自然科学基金

0+阅读 · 2012年12月31日

弱监督条件下RGB-D时序图像的语义分割模型与迁移学习算法

国家自然科学基金

0+阅读 · 2012年12月31日

通用Java程序到实时Java程序的对象自动分类和转化方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

多模式合成孔径激光雷达成像原理与实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

多目标优化Pareto支配性的模式识别方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

模式识别中多元数据的对偶射影空间表示和可视化问题

国家自然科学基金

0+阅读 · 2008年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员