Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching - 专知论文

会员服务 ·

0

少样本学习 · 样本 · 监督 · 参数匹配 · 分层编码 ·

2023 年 3 月 27 日

Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching

翻译：视觉标记匹配的通用密集预测任务少样本学习

Donggyun Kim,Jinwoo Kim,Seongwoong Cho,Chong Luo,Seunghoon Hong

Dense prediction tasks are a fundamental class of problems in computer vision. As supervised methods suffer from high pixel-wise labeling cost, a few-shot learning solution that can learn any dense task from a few labeled images is desired. Yet, current few-shot learning methods target a restricted set of tasks such as semantic segmentation, presumably due to challenges in designing a general and unified model that is able to flexibly and efficiently adapt to arbitrary tasks of unseen semantics. We propose Visual Token Matching (VTM), a universal few-shot learner for arbitrary dense prediction tasks. It employs non-parametric matching on patch-level embedded tokens of images and labels that encapsulates all tasks. Also, VTM flexibly adapts to any task with a tiny amount of task-specific parameters that modulate the matching algorithm. We implement VTM as a powerful hierarchical encoder-decoder architecture involving ViT backbones where token matching is performed at multiple feature hierarchies. We experiment VTM on a challenging variant of Taskonomy dataset and observe that it robustly few-shot learns various unseen dense prediction tasks. Surprisingly, it is competitive with fully supervised baselines using only 10 labeled examples of novel tasks (0.004% of full supervision) and sometimes outperforms using 0.1% of full supervision. Codes are available at https://github.com/GitGyun/visual_token_matching.

翻译：密集预测任务是计算机视觉中的一类基础问题。由于监督方法需要高昂的逐像素标注成本，因此我们期望一种能从少量标注图像中学习任意密集任务的少样本学习方法。然而，当前的少样本学习方法仅针对语义分割等有限任务，这可能是由于设计一个通用且统一的模型面临挑战，该模型需能灵活高效地适应任意未见语义的任务。我们提出了视觉标记匹配（VTM），一种用于任意密集预测任务的通用少样本学习器。它采用非参数匹配方法，对图像和标签的块级嵌入标记进行匹配，从而囊括所有任务。此外，VTM通过少量任务特定参数灵活适应任意任务，这些参数可调节匹配算法。我们将VTM实现为一种强大的层次化编码器-解码器架构，采用ViT骨干网络，并在多个特征层级上执行标记匹配。我们在Taskonomy数据集的一个具有挑战性的变体上实验VTM，观察到它能够稳健地对各种未见密集预测任务进行少样本学习。令人惊讶的是，它仅使用新任务的10个标注样本（占全监督的0.004%）即可与全监督基线相当，有时甚至在使用0.1%全监督样本时表现更优。代码可在https://github.com/GitGyun/visual_token_matching获取。

0

相关内容

少样本学习

少样本学习

【NeurIPS2022】不用微调的加速大规模视觉Transformer的密集预测

【NeurIPS2022】不用微调的加速大规模视觉Transformer的密集预测

专知会员服务

14+阅读 · 2022年10月5日

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

专知会员服务

32+阅读 · 2022年3月12日

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

SiT: 自监督视觉Transformer

专知会员服务

65+阅读 · 2021年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【斯坦福大学-ICLR2020】图神经网络预训练的策略，Strategies for Pre-training Graph Neural Networks

【斯坦福大学-ICLR2020】图神经网络预训练的策略，Strategies for Pre-training Graph Neural Networks

专知会员服务

78+阅读 · 2020年3月1日

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

专知会员服务

60+阅读 · 2019年12月24日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

专知会员服务

14+阅读 · 2019年11月11日

论文浅尝 | 弱监督下极简的视觉语言预训练模型

论文浅尝 | 弱监督下极简的视觉语言预训练模型

开放知识图谱

1+阅读 · 2022年9月26日

ECCV 2022 | SSP: 自支持匹配的小样本任务新思想

ECCV 2022 | SSP: 自支持匹配的小样本任务新思想

PaperWeekly

0+阅读 · 2022年7月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

文本+视觉，多篇 Visual/Video BERT 论文介绍

文本+视觉，多篇 Visual/Video BERT 论文介绍

AI科技评论

22+阅读 · 2019年8月30日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

专知

29+阅读 · 2018年3月12日

面向可穿戴电子的可拉伸弹性网格储能器件的研究

国家自然科学基金

0+阅读 · 2015年12月31日

融合多尺度稀疏与稠密特征结构的透视不变图像匹配模型研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于意图信息共享的航空器4D航迹预测方法

国家自然科学基金

0+阅读 · 2013年12月31日

基于机器学习的AbS-LPC压缩语音隐写分析研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于TP模型变换方法的视觉伺服控制技术研究

国家自然科学基金

2+阅读 · 2013年12月31日

数据中心Fat-Tree批量调度光包交换新架构

国家自然科学基金

2+阅读 · 2012年12月31日

煤矿井下基于空时处理的编码协作MC-CDMA跨层自适应无线传输技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向综合力学环境预测的回归多任务学习研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于二维随机映射和一范数优化的有监督图像分类研究

国家自然科学基金

3+阅读 · 2011年12月31日

基于Sparse-Land模型的SAR图像噪声抑制与分割

国家自然科学基金

0+阅读 · 2009年12月31日

Learning Higher-order Object Interactions for Keypoint-based Video Understanding

Arxiv

0+阅读 · 2023年5月16日

Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition

Arxiv

0+阅读 · 2023年5月16日

Versatile Audio-Visual Learning for Handling Single and Multi Modalities in Emotion Regression and Classification Tasks

Arxiv

0+阅读 · 2023年5月12日

Hybrid Curriculum Learning for Emotion Recognition in Conversation

Arxiv

14+阅读 · 2021年12月22日

GAN-Supervised Dense Visual Alignment

Arxiv

10+阅读 · 2021年12月9日

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Arxiv

13+阅读 · 2021年4月7日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Matching Networks for One Shot Learning

Arxiv

10+阅读 · 2017年12月29日

VIP会员

文章信息

相关主题

少样本学习

最新内容

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

专知会员服务

3+阅读 · 今天14:49

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

专知会员服务

3+阅读 · 6月19日

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

专知会员服务

5+阅读 · 6月19日

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

6+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

7+阅读 · 6月18日

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

专知会员服务

11+阅读 · 6月18日

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

专知会员服务

10+阅读 · 6月18日

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

专知会员服务

7+阅读 · 6月17日

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

专知会员服务

11+阅读 · 6月17日

学习数据的几何：形状空间分析数学综述

学习数据的几何：形状空间分析数学综述

专知会员服务

7+阅读 · 6月17日

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

专知会员服务

15+阅读 · 6月17日

定向能反无人机系统最新发展动态

定向能反无人机系统最新发展动态

专知会员服务

8+阅读 · 6月17日

从燃煤战舰到算法战争：水面指挥的永恒要求

从燃煤战舰到算法战争：水面指挥的永恒要求

专知会员服务

6+阅读 · 6月17日

《短程弹道再入飞行器拦截时间中的一项异常现象》

《短程弹道再入飞行器拦截时间中的一项异常现象》

专知会员服务

8+阅读 · 6月17日

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

专知会员服务

8+阅读 · 6月17日

相关VIP内容

【NeurIPS2022】不用微调的加速大规模视觉Transformer的密集预测

【NeurIPS2022】不用微调的加速大规模视觉Transformer的密集预测

专知会员服务

14+阅读 · 2022年10月5日

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

专知会员服务

32+阅读 · 2022年3月12日

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

SiT: 自监督视觉Transformer

专知会员服务

65+阅读 · 2021年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【斯坦福大学-ICLR2020】图神经网络预训练的策略，Strategies for Pre-training Graph Neural Networks

【斯坦福大学-ICLR2020】图神经网络预训练的策略，Strategies for Pre-training Graph Neural Networks

专知会员服务

78+阅读 · 2020年3月1日

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

专知会员服务

60+阅读 · 2019年12月24日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

专知会员服务

14+阅读 · 2019年11月11日

热门VIP内容

开通专知VIP会员享更多权益服务

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

相关资讯

论文浅尝 | 弱监督下极简的视觉语言预训练模型

论文浅尝 | 弱监督下极简的视觉语言预训练模型

开放知识图谱

1+阅读 · 2022年9月26日

ECCV 2022 | SSP: 自支持匹配的小样本任务新思想

ECCV 2022 | SSP: 自支持匹配的小样本任务新思想

PaperWeekly

0+阅读 · 2022年7月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

文本+视觉，多篇 Visual/Video BERT 论文介绍

文本+视觉，多篇 Visual/Video BERT 论文介绍

AI科技评论

22+阅读 · 2019年8月30日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

专知

29+阅读 · 2018年3月12日

相关论文

Learning Higher-order Object Interactions for Keypoint-based Video Understanding

Arxiv

0+阅读 · 2023年5月16日

Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition

Arxiv

0+阅读 · 2023年5月16日

Versatile Audio-Visual Learning for Handling Single and Multi Modalities in Emotion Regression and Classification Tasks

Arxiv

0+阅读 · 2023年5月12日

Hybrid Curriculum Learning for Emotion Recognition in Conversation

Arxiv

14+阅读 · 2021年12月22日

GAN-Supervised Dense Visual Alignment

Arxiv

10+阅读 · 2021年12月9日

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Arxiv

13+阅读 · 2021年4月7日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Matching Networks for One Shot Learning

Arxiv

10+阅读 · 2017年12月29日

相关基金

面向可穿戴电子的可拉伸弹性网格储能器件的研究

国家自然科学基金

0+阅读 · 2015年12月31日

融合多尺度稀疏与稠密特征结构的透视不变图像匹配模型研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于意图信息共享的航空器4D航迹预测方法

国家自然科学基金

0+阅读 · 2013年12月31日

基于机器学习的AbS-LPC压缩语音隐写分析研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于TP模型变换方法的视觉伺服控制技术研究

国家自然科学基金

2+阅读 · 2013年12月31日

数据中心Fat-Tree批量调度光包交换新架构

国家自然科学基金

2+阅读 · 2012年12月31日

煤矿井下基于空时处理的编码协作MC-CDMA跨层自适应无线传输技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向综合力学环境预测的回归多任务学习研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于二维随机映射和一范数优化的有监督图像分类研究

国家自然科学基金

3+阅读 · 2011年12月31日

基于Sparse-Land模型的SAR图像噪声抑制与分割

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员