Multi-Modal Few-Shot Object Detection with Meta-Learning-Based Cross-Modal Prompting - 专知论文

会员服务 ·

0

小样本物体检测 · 小样本 · 物体检测 · 元学习 · 模态 ·

2023 年 3 月 27 日

Multi-Modal Few-Shot Object Detection with Meta-Learning-Based Cross-Modal Prompting

翻译：多模态小样本目标检测：基于元学习的跨模态提示方法

Guangxing Han,Long Chen,Jiawei Ma,Shiyuan Huang,Rama Chellappa,Shih-Fu Chang

from arxiv, 17 pages

We study multi-modal few-shot object detection (FSOD) in this paper, using both few-shot visual examples and class semantic information for detection, which are complementary to each other by definition. Most of the previous works on multi-modal FSOD are fine-tuning-based which are inefficient for online applications. Moreover, these methods usually require expertise like class names to extract class semantic embedding, which are hard to get for rare classes. Our approach is motivated by the high-level conceptual similarity of (metric-based) meta-learning and prompt-based learning to learn generalizable few-shot and zero-shot object detection models respectively without fine-tuning. Specifically, we combine the few-shot visual classifier and text classifier learned via meta-learning and prompt-based learning respectively to build the multi-modal classifier and detection models. In addition, to fully exploit the pre-trained language models, we propose meta-learning-based cross-modal prompting to generate soft prompts for novel classes present in few-shot visual examples, which are then used to learn the text classifier. Knowledge distillation is introduced to learn the soft prompt generator without using human prior knowledge of class names, which may not be available for rare classes. Our insight is that the few-shot support images naturally include related context information and semantics of the class. We comprehensively evaluate the proposed multi-modal FSOD models on multiple few-shot object detection benchmarks, achieving promising results.

翻译：本文研究多模态小样本目标检测（FSOD），利用少量视觉样本和类别语义信息进行检测，两者在定义上具有互补性。以往的多模态FSOD方法大多基于微调，难以适用于在线应用场景。此外，这些方法通常需要借助类别名称等专业知识来提取类别语义嵌入，这对稀有类别而言难以获取。受度量元学习与提示学习在高层次概念上的相似性启发，本文方法无需微调即可分别学习具有泛化能力的小样本和零样本目标检测模型。具体而言，我们将通过元学习获得的小样本视觉分类器与基于提示学习的文本分类器相结合，构建多模态分类器与检测模型。为充分利用预训练语言模型，我们提出基于元学习的跨模态提示生成方法，为小样本视觉示例中出现的新类别生成软提示，进而用于训练文本分类器。引入知识蒸馏技术无需使用人类关于类别名称的先验知识即可学习软提示生成器——这对稀有类别可能尤为重要。我们的核心洞察在于：小样本支持图像天然包含了类别的相关上下文信息与语义特征。我们在多个小样本目标检测基准上对提出的多模态FSOD模型进行了全面评估，取得了令人瞩目的实验结果。

0

相关内容

小样本物体检测

小样本物体检测

【CVPR2022】跨模态检索的协同双流视觉语言预训练模型

【CVPR2022】跨模态检索的协同双流视觉语言预训练模型

专知会员服务

21+阅读 · 2022年4月21日

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

专知会员服务

13+阅读 · 2022年3月12日

AAAI 2022 | 基于预训练-微调框架的图像差异描述任务

AAAI 2022 | 基于预训练-微调框架的图像差异描述任务

专知会员服务

18+阅读 · 2022年2月26日

浙大《深度学习低样本目标检测》综述论文

浙大《深度学习低样本目标检测》综述论文

专知会员服务

76+阅读 · 2021年12月13日

预训练模型如何用于文本挖掘？看这份KDD2021-UIUC《预训练文本表示:模型与应用在文本挖掘》教程，附200页Slides

专知会员服务

44+阅读 · 2021年8月18日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【AAAI2020】Context-Transformer:上下文转换器:解决对象混淆的小样本检测，Context-Transformer: Tackling Object Confusion for Few-Shot Detection

【AAAI2020】Context-Transformer:上下文转换器:解决对象混淆的小样本检测，Context-Transformer: Tackling Object Confusion for Few-Shot Detection

专知会员服务

51+阅读 · 2020年3月17日

【三星AI-CVPR2020】增量小样本目标检测，Incremental Few-Shot Object Detection

专知会员服务

69+阅读 · 2020年3月11日

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

专知会员服务

33+阅读 · 2020年2月29日

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

专知会员服务

60+阅读 · 2019年12月24日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知

16+阅读 · 2020年5月31日

【三星AI-CVPR2020】增量小样本目标检测，Incremental Few-Shot Object Detection

【三星AI-CVPR2020】增量小样本目标检测，Incremental Few-Shot Object Detection

专知

55+阅读 · 2020年3月11日

度量学习中的pair-based loss

度量学习中的pair-based loss

极市平台

65+阅读 · 2019年7月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

Zero-Shot Learning相关资源大列表

Zero-Shot Learning相关资源大列表

专知

52+阅读 · 2019年1月1日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

基于知识迁移的有限样本模式分类研究

国家自然科学基金

5+阅读 · 2014年12月31日

高分辨率遥感影像房屋特征迁移学习与变化检测研究

国家自然科学基金

4+阅读 · 2013年12月31日

基于混合属性分析的人体行为识别方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于人体姿态表示的动作识别方法研究

国家自然科学基金

2+阅读 · 2012年12月31日

ING3：原发性肝癌的诊断与治疗新靶点

国家自然科学基金

0+阅读 · 2012年12月31日

基于众源影像的三维街景模型增强

国家自然科学基金

0+阅读 · 2012年12月31日

形式化软件规约Radl获取、验证与确认方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

开放域动态事实性信息获取及融合方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于核酸适配体的酶分子检测法

国家自然科学基金

0+阅读 · 2009年12月31日

鼻咽癌细胞及肿瘤间质中全基因组DNA甲基化水平异化及相关致癌机理的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Task and Motion Planning with Large Language Models for Object Rearrangement

Arxiv

0+阅读 · 2023年5月16日

MsPrompt: Multi-step Prompt Learning for Debiasing Few-shot Event Detection

Arxiv

0+阅读 · 2023年5月16日

Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning

Arxiv

5+阅读 · 2023年5月12日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

Pix2seq: A Language Modeling Framework for Object Detection

Arxiv

10+阅读 · 2021年9月22日

Few-shot Learning for Multi-label Intent Detection

Arxiv

21+阅读 · 2020年10月11日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

Few-shot Learning: A Survey

Few-shot Learning: A Survey

Arxiv

363+阅读 · 2019年4月10日

Zero-Shot Object Detection by Hybrid Region Embedding

Arxiv

19+阅读 · 2018年5月17日

Transferring Common-Sense Knowledge for Object Detection

Arxiv

12+阅读 · 2018年4月3日

VIP会员

文章信息

相关主题

小样本物体检测

最新内容

无人机自主控制与人工智能：系统性综述

无人机自主控制与人工智能：系统性综述

专知会员服务

2+阅读 · 今天7:25

巡飞弹与反无人机系统——现代战场的两大支柱

巡飞弹与反无人机系统——现代战场的两大支柱

专知会员服务

1+阅读 · 今天6:54

《打造“黄金舰队”》57页报告

《打造“黄金舰队”》57页报告

专知会员服务

1+阅读 · 今天6:52

《北约数字教官网络发展路径》128页报告

《北约数字教官网络发展路径》128页报告

专知会员服务

1+阅读 · 今天6:33

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

专知会员服务

6+阅读 · 6月25日

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

专知会员服务

5+阅读 · 6月25日

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

专知会员服务

8+阅读 · 6月25日

网状网络及其在军事领域的运用

网状网络及其在军事领域的运用

专知会员服务

7+阅读 · 6月25日

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

专知会员服务

8+阅读 · 6月25日

无美国参与的欧洲战争方式（万字长文）

无美国参与的欧洲战争方式（万字长文）

专知会员服务

8+阅读 · 6月25日

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

专知会员服务

9+阅读 · 6月25日

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

专知会员服务

9+阅读 · 6月25日

《国防领域敏感性分析白皮书》

《国防领域敏感性分析白皮书》

专知会员服务

9+阅读 · 6月25日

综述 | 从问答到任务完成：Agent系统与Harness设计

综述 | 从问答到任务完成：Agent系统与Harness设计

专知会员服务

9+阅读 · 6月24日

Agentic RL：框架、实践与长程智能体训练

Agentic RL：框架、实践与长程智能体训练

专知会员服务

10+阅读 · 6月24日

相关VIP内容

【CVPR2022】跨模态检索的协同双流视觉语言预训练模型

【CVPR2022】跨模态检索的协同双流视觉语言预训练模型

专知会员服务

21+阅读 · 2022年4月21日

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

专知会员服务

13+阅读 · 2022年3月12日

AAAI 2022 | 基于预训练-微调框架的图像差异描述任务

AAAI 2022 | 基于预训练-微调框架的图像差异描述任务

专知会员服务

18+阅读 · 2022年2月26日

浙大《深度学习低样本目标检测》综述论文

浙大《深度学习低样本目标检测》综述论文

专知会员服务

76+阅读 · 2021年12月13日

预训练模型如何用于文本挖掘？看这份KDD2021-UIUC《预训练文本表示:模型与应用在文本挖掘》教程，附200页Slides

专知会员服务

44+阅读 · 2021年8月18日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【AAAI2020】Context-Transformer:上下文转换器:解决对象混淆的小样本检测，Context-Transformer: Tackling Object Confusion for Few-Shot Detection

【AAAI2020】Context-Transformer:上下文转换器:解决对象混淆的小样本检测，Context-Transformer: Tackling Object Confusion for Few-Shot Detection

专知会员服务

51+阅读 · 2020年3月17日

【三星AI-CVPR2020】增量小样本目标检测，Incremental Few-Shot Object Detection

专知会员服务

69+阅读 · 2020年3月11日

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

专知会员服务

33+阅读 · 2020年2月29日

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

专知会员服务

60+阅读 · 2019年12月24日

热门VIP内容

开通专知VIP会员享更多权益服务

巡飞弹与反无人机系统——现代战场的两大支柱

《北约数字教官网络发展路径》128页报告

无人机自主控制与人工智能：系统性综述

《打造“黄金舰队”》57页报告

相关资讯

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知

16+阅读 · 2020年5月31日

【三星AI-CVPR2020】增量小样本目标检测，Incremental Few-Shot Object Detection

【三星AI-CVPR2020】增量小样本目标检测，Incremental Few-Shot Object Detection

专知

55+阅读 · 2020年3月11日

度量学习中的pair-based loss

度量学习中的pair-based loss

极市平台

65+阅读 · 2019年7月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

Zero-Shot Learning相关资源大列表

Zero-Shot Learning相关资源大列表

专知

52+阅读 · 2019年1月1日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

相关论文

Task and Motion Planning with Large Language Models for Object Rearrangement

Arxiv

0+阅读 · 2023年5月16日

MsPrompt: Multi-step Prompt Learning for Debiasing Few-shot Event Detection

Arxiv

0+阅读 · 2023年5月16日

Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning

Arxiv

5+阅读 · 2023年5月12日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

Pix2seq: A Language Modeling Framework for Object Detection

Arxiv

10+阅读 · 2021年9月22日

Few-shot Learning for Multi-label Intent Detection

Arxiv

21+阅读 · 2020年10月11日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

Few-shot Learning: A Survey

Few-shot Learning: A Survey

Arxiv

363+阅读 · 2019年4月10日

Zero-Shot Object Detection by Hybrid Region Embedding

Arxiv

19+阅读 · 2018年5月17日

Transferring Common-Sense Knowledge for Object Detection

Arxiv

12+阅读 · 2018年4月3日

相关基金

基于知识迁移的有限样本模式分类研究

国家自然科学基金

5+阅读 · 2014年12月31日

高分辨率遥感影像房屋特征迁移学习与变化检测研究

国家自然科学基金

4+阅读 · 2013年12月31日

基于混合属性分析的人体行为识别方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于人体姿态表示的动作识别方法研究

国家自然科学基金

2+阅读 · 2012年12月31日

ING3：原发性肝癌的诊断与治疗新靶点

国家自然科学基金

0+阅读 · 2012年12月31日

基于众源影像的三维街景模型增强

国家自然科学基金

0+阅读 · 2012年12月31日

形式化软件规约Radl获取、验证与确认方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

开放域动态事实性信息获取及融合方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于核酸适配体的酶分子检测法

国家自然科学基金

0+阅读 · 2009年12月31日

鼻咽癌细胞及肿瘤间质中全基因组DNA甲基化水平异化及相关致癌机理的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员