Linguistic Query-Guided Mask Generation for Referring Image Segmentation - 专知论文

会员服务 ·

0

掩码 · 图像分割 · INTERACT · Performer · prototype ·

2023 年 3 月 22 日

Linguistic Query-Guided Mask Generation for Referring Image Segmentation

翻译：面向指代图像分割的语言查询引导掩码生成

Zhichao Wei,Xiaohao Chen,Mingqiang Chen,Siyu Zhu

Referring image segmentation aims to segment the image region of interest according to the given language expression, which is a typical multi-modal task. Existing methods either adopt the pixel classification-based or the learnable query-based framework for mask generation, both of which are insufficient to deal with various text-image pairs with a fix number of parametric prototypes. In this work, we propose an end-to-end framework built on transformer to perform Linguistic query-Guided mask generation, dubbed LGFormer. It views the linguistic features as query to generate a specialized prototype for arbitrary input image-text pair, thus generating more consistent segmentation results. Moreover, we design several cross-modal interaction modules (\eg, vision-language bidirectional attention module, VLBA) in both encoder and decoder to achieve better cross-modal alignment.

翻译：指代图像分割旨在根据给定的语言表达分割出图像中的感兴趣区域，这是一项典型的多模态任务。现有方法要么采用基于像素分类的框架，要么采用基于可学习查询的框架来生成掩码，这两种方法都难以用固定数量的参数原型来处理各种文本-图像对。在这项工作中，我们提出了一种基于Transformer的端到端框架，用于执行语言查询引导的掩码生成，命名为LGFormer。它将语言特征视为查询，为任意输入图像-文本对生成专门的原型，从而产生更一致的分割结果。此外，我们在编码器和解码器中设计了多个跨模态交互模块（例如，视觉-语言双向注意力模块VLBA），以实现更好的跨模态对齐。

0

相关内容

【CVPR 2022】基于Transformer的图象风格化，StyTr2: Image Style Transfer with Transformers

【CVPR 2022】基于Transformer的图象风格化，StyTr2: Image Style Transfer with Transformers

专知会员服务

11+阅读 · 2022年3月19日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

专知会员服务

36+阅读 · 2020年4月14日

【CVPR2020-牛津大学】具有自适应邻域一致性的通信网络，Correspondence Networks with Adaptive Neighbourhood Consensus

【CVPR2020-牛津大学】具有自适应邻域一致性的通信网络，Correspondence Networks with Adaptive Neighbourhood Consensus

专知会员服务

16+阅读 · 2020年3月27日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

专知会员服务

45+阅读 · 2020年1月15日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新六篇视觉问答相关论文—鲁棒性分析、虚拟意象、双曲注意力网络、R-VQA、关系推理、双线性注意力网络

【论文推荐】最新六篇视觉问答相关论文—鲁棒性分析、虚拟意象、双曲注意力网络、R-VQA、关系推理、双线性注意力网络

专知

17+阅读 · 2018年6月7日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

Heisenberg群与Minkowski空间中的非线性椭圆方程

国家自然科学基金

0+阅读 · 2014年12月31日

空间插值的微分几何方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

核酸适配体aptamer原位募集骨髓间充质干细胞在兔胫骨缺损修复中的研究

国家自然科学基金

0+阅读 · 2013年12月31日

融合多尺度上下文的图像标注研究

国家自然科学基金

2+阅读 · 2013年12月31日

奇性空间上的几何分析

国家自然科学基金

0+阅读 · 2013年12月31日

基于WorldView-3和OP-ELM的矿化蚀变提取方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于云计算的建筑全生命期BIM集成与应用关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于LDPC的Relay系统译码和信号星座协作理论与技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Tetrolet变换的偏振遥感图像融合算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

广义Kloosterman和的均值估计

国家自然科学基金

1+阅读 · 2011年12月31日

Towards Versatile and Efficient Visual Knowledge Injection into Pre-trained Language Models with Cross-Modal Adapters

Arxiv

0+阅读 · 2023年5月12日

Explore the Power of Synthetic Data on Few-shot Object Detection

Arxiv

0+阅读 · 2023年5月12日

Transformers in Time Series: A Survey

Arxiv

0+阅读 · 2023年5月11日

iEdit: Localised Text-guided Image Editing with Weak Supervision

Arxiv

0+阅读 · 2023年5月10日

Multilingual LLMs are Better Cross-lingual In-context Learners with Alignment

Arxiv

0+阅读 · 2023年5月10日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Contrastive learning of global and local features for medical image segmentation with limited annotations

Arxiv

19+阅读 · 2020年6月18日

Cross-Modal Self-Attention Network for Referring Image Segmentation

Cross-Modal Self-Attention Network for Referring Image Segmentation

Arxiv

18+阅读 · 2019年4月9日

Transferring Common-Sense Knowledge for Object Detection

Arxiv

12+阅读 · 2018年4月3日

VIP会员

文章信息

相关主题

最新内容

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

专知会员服务

1+阅读 · 今天13:56

多模态代码智能综述：从视觉输入到可执行代码系统

多模态代码智能综述：从视觉输入到可执行代码系统

专知会员服务

1+阅读 · 今天13:54

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

专知会员服务

4+阅读 · 今天8:18

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

专知会员服务

3+阅读 · 今天7:39

《通用大语言模型：无人机指挥与控制接口》最新40页

《通用大语言模型：无人机指挥与控制接口》最新40页

专知会员服务

9+阅读 · 今天7:33

《通过小型无人机系统将情报能力“作战化”》

《通过小型无人机系统将情报能力“作战化”》

专知会员服务

3+阅读 · 今天7:28

《神经安全型有人–无人协同：面向认知自适应作战能力的参考架构》

《神经安全型有人–无人协同：面向认知自适应作战能力的参考架构》

专知会员服务

6+阅读 · 今天7:14

《在指挥链中通过多准则决策分析传达指挥官意图：空战实验》

《在指挥链中通过多准则决策分析传达指挥官意图：空战实验》

专知会员服务

19+阅读 · 6月15日

消耗优势：美军的“精确规模化”概念

消耗优势：美军的“精确规模化”概念

专知会员服务

8+阅读 · 6月15日

五角大楼的AI优先战略及其对现代战争的启示：来自与伊朗冲突的经验教训

五角大楼的AI优先战略及其对现代战争的启示：来自与伊朗冲突的经验教训

专知会员服务

9+阅读 · 6月15日

《网络空间兵棋推演：挑战、局限性与混合路径》报告

《网络空间兵棋推演：挑战、局限性与混合路径》报告

专知会员服务

9+阅读 · 6月15日

《离线语言支持系统：面向空战战术决策》

《离线语言支持系统：面向空战战术决策》

专知会员服务

8+阅读 · 6月15日

《以通信为中心的6G–LLM架构：面向可扩展的战术自主防御车辆网络》

《以通信为中心的6G–LLM架构：面向可扩展的战术自主防御车辆网络》

专知会员服务

8+阅读 · 6月15日

ICML 2026｜ECA：面向开放式图文生成的高效持续对齐

ICML 2026｜ECA：面向开放式图文生成的高效持续对齐

专知会员服务

6+阅读 · 6月14日

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

专知会员服务

6+阅读 · 6月14日

相关VIP内容

【CVPR 2022】基于Transformer的图象风格化，StyTr2: Image Style Transfer with Transformers

【CVPR 2022】基于Transformer的图象风格化，StyTr2: Image Style Transfer with Transformers

专知会员服务

11+阅读 · 2022年3月19日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

专知会员服务

36+阅读 · 2020年4月14日

【CVPR2020-牛津大学】具有自适应邻域一致性的通信网络，Correspondence Networks with Adaptive Neighbourhood Consensus

【CVPR2020-牛津大学】具有自适应邻域一致性的通信网络，Correspondence Networks with Adaptive Neighbourhood Consensus

专知会员服务

16+阅读 · 2020年3月27日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

专知会员服务

45+阅读 · 2020年1月15日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

多模态代码智能综述：从视觉输入到可执行代码系统

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新六篇视觉问答相关论文—鲁棒性分析、虚拟意象、双曲注意力网络、R-VQA、关系推理、双线性注意力网络

【论文推荐】最新六篇视觉问答相关论文—鲁棒性分析、虚拟意象、双曲注意力网络、R-VQA、关系推理、双线性注意力网络

专知

17+阅读 · 2018年6月7日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

相关论文

Towards Versatile and Efficient Visual Knowledge Injection into Pre-trained Language Models with Cross-Modal Adapters

Arxiv

0+阅读 · 2023年5月12日

Explore the Power of Synthetic Data on Few-shot Object Detection

Arxiv

0+阅读 · 2023年5月12日

Transformers in Time Series: A Survey

Arxiv

0+阅读 · 2023年5月11日

iEdit: Localised Text-guided Image Editing with Weak Supervision

Arxiv

0+阅读 · 2023年5月10日

Multilingual LLMs are Better Cross-lingual In-context Learners with Alignment

Arxiv

0+阅读 · 2023年5月10日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Contrastive learning of global and local features for medical image segmentation with limited annotations

Arxiv

19+阅读 · 2020年6月18日

Cross-Modal Self-Attention Network for Referring Image Segmentation

Cross-Modal Self-Attention Network for Referring Image Segmentation

Arxiv

18+阅读 · 2019年4月9日

Transferring Common-Sense Knowledge for Object Detection

Arxiv

12+阅读 · 2018年4月3日

相关基金

Heisenberg群与Minkowski空间中的非线性椭圆方程

国家自然科学基金

0+阅读 · 2014年12月31日

空间插值的微分几何方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

核酸适配体aptamer原位募集骨髓间充质干细胞在兔胫骨缺损修复中的研究

国家自然科学基金

0+阅读 · 2013年12月31日

融合多尺度上下文的图像标注研究

国家自然科学基金

2+阅读 · 2013年12月31日

奇性空间上的几何分析

国家自然科学基金

0+阅读 · 2013年12月31日

基于WorldView-3和OP-ELM的矿化蚀变提取方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于云计算的建筑全生命期BIM集成与应用关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于LDPC的Relay系统译码和信号星座协作理论与技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Tetrolet变换的偏振遥感图像融合算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

广义Kloosterman和的均值估计

国家自然科学基金

1+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员