ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers - 专知论文

会员服务 ·

0

Boosting（一种模型训练加速方式） · Vision · Performer · 变换 · 卷积 ·

2023 年 5 月 24 日

ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers

翻译：ViTMatte：利用预训练纯视觉Transformer提升图像抠图性能

Jingfeng Yao,Xinggang Wang,Shusheng Yang,Baoyuan Wang

Recently, plain vision Transformers (ViTs) have shown impressive performance on various computer vision tasks, thanks to their strong modeling capacity and large-scale pretraining. However, they have not yet conquered the problem of image matting. We hypothesize that image matting could also be boosted by ViTs and present a new efficient and robust ViT-based matting system, named ViTMatte. Our method utilizes (i) a hybrid attention mechanism combined with a convolution neck to help ViTs achieve an excellent performance-computation trade-off in matting tasks. (ii) Additionally, we introduce the detail capture module, which just consists of simple lightweight convolutions to complement the detailed information required by matting. To the best of our knowledge, ViTMatte is the first work to unleash the potential of ViT on image matting with concise adaptation. It inherits many superior properties from ViT to matting, including various pretraining strategies, concise architecture design, and flexible inference strategies. We evaluate ViTMatte on Composition-1k and Distinctions-646, the most commonly used benchmark for image matting, our method achieves state-of-the-art performance and outperforms prior matting works by a large margin.

翻译：近年来，纯视觉Transformer（ViT）凭借其强大的建模能力和大规模预训练，在各类计算机视觉任务中展现出卓越性能。然而，这类模型尚未攻克图像抠图问题。我们假设ViT同样能推动图像抠图的发展，并提出一种基于ViT的高效稳健抠图系统——ViTMatte。该方法包含：（i）混合注意力机制结合卷积颈，帮助ViT在抠图任务中实现性能与计算量的出色平衡；（ii）引入仅由轻量卷积构成的细节捕获模块，用于补充抠图所需的细节信息。据我们所知，ViTMatte是首个通过简洁适配挖掘ViT在图像抠图领域潜力的工作，它继承了ViT的多种优越特性，包括多样化的预训练策略、简明的架构设计以及灵活的推理策略。在图像抠图领域最常用的基准数据集Composition-1k和Distinctions-646上，ViTMatte取得了最先进的性能，并以显著优势超越了先前所有抠图方法。

0

相关内容

Boosting（一种模型训练加速方式）

Boosting（一种模型训练加速方式）

CVPR 2023开会了！谷歌等最新《视觉上理解和解释注意力》教程，附152页ppt

CVPR 2023开会了！谷歌等最新《视觉上理解和解释注意力》教程，附152页ppt

专知会员服务

86+阅读 · 2023年6月19日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

326+阅读 · 2020年11月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

专知

16+阅读 · 2018年2月13日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

液相激光溶蚀技术制备新型光磁双模态成像探针应用于肿瘤靶向显像研究

国家自然科学基金

0+阅读 · 2014年12月31日

DGKε/SNARE信号通路在糖尿病肾病足细胞胰岛素抵抗中的作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

E3连接酶AtTR1在植物盐胁迫信号转导中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

A-D-A型宽光谱响应窄带隙聚合物供体材料的合成及其光伏性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

有机硅改性水泥混凝土制备，性能及机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

高效双光子荧光性能蚕丝支架的制备、发光机理及其生物成像应用

国家自然科学基金

0+阅读 · 2013年12月31日

基于核方法的非局部图像处理

国家自然科学基金

0+阅读 · 2012年12月31日

Hedgehog信号通路在tPA加剧脑梗塞血脑屏障损害中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

碳纳米材料-核酸适体新型分子识别体系的设计及其在电化学生物传感中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

农业用水效用评价方法及其尺度效应研究

国家自然科学基金

0+阅读 · 2009年12月31日

FedYolo: Augmenting Federated Learning with Pretrained Transformers

Arxiv

0+阅读 · 2023年7月10日

Art Authentication with Vision Transformers

Arxiv

0+阅读 · 2023年7月10日

FILM: How can Few-Shot Image Classification Benefit from Pre-Trained Language Models?

Arxiv

0+阅读 · 2023年7月9日

Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation

Arxiv

0+阅读 · 2023年7月7日

Learning Imbalanced Data with Vision Transformers

Arxiv

11+阅读 · 2023年3月8日

Multi-Task Learning for Visual Scene Understanding

Arxiv

29+阅读 · 2022年3月28日

Unsupervised Multi-Source Domain Adaptation for Person Re-Identification

Arxiv

14+阅读 · 2021年4月27日

SiT: Self-supervised vIsion Transformer

Arxiv

19+阅读 · 2021年4月8日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

VIP会员

文章信息

相关主题

Boosting（一种模型训练加速方式）

最新内容

ICML 2026 | SARDI：扩散语言模型的自增强检索

ICML 2026 | SARDI：扩散语言模型的自增强检索

专知会员服务

4+阅读 · 6月6日

长时程具身智能安全综述：机器人操作的跨层分析

长时程具身智能安全综述：机器人操作的跨层分析

专知会员服务

4+阅读 · 6月6日

从“杀伤链”到“杀伤网”：新时代防空反导体系的真正需求

从“杀伤链”到“杀伤网”：新时代防空反导体系的真正需求

专知会员服务

9+阅读 · 6月6日

《锻造军官能力：军官发展的军事训练、学术教育及设计思维导向创新的多维度研究》最新300页

《锻造军官能力：军官发展的军事训练、学术教育及设计思维导向创新的多维度研究》最新300页

专知会员服务

5+阅读 · 6月6日

《国防领域安全采用大语言模型的战略蓝图》

《国防领域安全采用大语言模型的战略蓝图》

专知会员服务

7+阅读 · 6月6日

《对抗性电磁环境下远程巡飞弹作战的保密指挥控制数据链》

《对抗性电磁环境下远程巡飞弹作战的保密指挥控制数据链》

专知会员服务

6+阅读 · 6月6日

CVPR2026奖项公布，谷歌D4RT最佳论文获奖，何恺明ResNet、YOLO获时间检验奖！

CVPR2026奖项公布，谷歌D4RT最佳论文获奖，何恺明ResNet、YOLO获时间检验奖！

专知会员服务

5+阅读 · 6月6日

ICML 2026 | 演化选择的因果建模

ICML 2026 | 演化选择的因果建模

专知会员服务

7+阅读 · 6月5日

综述｜学习式3D表征最新进展与趋势

综述｜学习式3D表征最新进展与趋势

专知会员服务

7+阅读 · 6月5日

《武器作战效能分析：基于虚拟构造仿真大数据与深度学习的初步见解》

《武器作战效能分析：基于虚拟构造仿真大数据与深度学习的初步见解》

专知会员服务

7+阅读 · 6月5日

《自主巡飞弹药系统量子逻辑框架：一种基于不确定模糊集的方法》

《自主巡飞弹药系统量子逻辑框架：一种基于不确定模糊集的方法》

专知会员服务

7+阅读 · 6月5日

人工智能重塑威慑：算法优势的兴起

人工智能重塑威慑：算法优势的兴起

专知会员服务

8+阅读 · 6月5日

【博士论文】基于物理结构与贝叶斯不确定性的可靠神经网络

【博士论文】基于物理结构与贝叶斯不确定性的可靠神经网络

专知会员服务

14+阅读 · 6月4日

AgentOps综述：智能体系统运维框架

AgentOps综述：智能体系统运维框架

专知会员服务

17+阅读 · 6月4日

《美陆军最新条令：兵力防护》

《美陆军最新条令：兵力防护》

专知会员服务

14+阅读 · 6月4日

相关VIP内容

CVPR 2023开会了！谷歌等最新《视觉上理解和解释注意力》教程，附152页ppt

CVPR 2023开会了！谷歌等最新《视觉上理解和解释注意力》教程，附152页ppt

专知会员服务

86+阅读 · 2023年6月19日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

326+阅读 · 2020年11月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

长时程具身智能安全综述：机器人操作的跨层分析

《锻造军官能力：军官发展的军事训练、学术教育及设计思维导向创新的多维度研究》最新300页

ICML 2026 | SARDI：扩散语言模型的自增强检索

从“杀伤链”到“杀伤网”：新时代防空反导体系的真正需求

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

专知

16+阅读 · 2018年2月13日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

相关论文

FedYolo: Augmenting Federated Learning with Pretrained Transformers

Arxiv

0+阅读 · 2023年7月10日

Art Authentication with Vision Transformers

Arxiv

0+阅读 · 2023年7月10日

FILM: How can Few-Shot Image Classification Benefit from Pre-Trained Language Models?

Arxiv

0+阅读 · 2023年7月9日

Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation

Arxiv

0+阅读 · 2023年7月7日

Learning Imbalanced Data with Vision Transformers

Arxiv

11+阅读 · 2023年3月8日

Multi-Task Learning for Visual Scene Understanding

Arxiv

29+阅读 · 2022年3月28日

Unsupervised Multi-Source Domain Adaptation for Person Re-Identification

Arxiv

14+阅读 · 2021年4月27日

SiT: Self-supervised vIsion Transformer

Arxiv

19+阅读 · 2021年4月8日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

相关基金

液相激光溶蚀技术制备新型光磁双模态成像探针应用于肿瘤靶向显像研究

国家自然科学基金

0+阅读 · 2014年12月31日

DGKε/SNARE信号通路在糖尿病肾病足细胞胰岛素抵抗中的作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

E3连接酶AtTR1在植物盐胁迫信号转导中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

A-D-A型宽光谱响应窄带隙聚合物供体材料的合成及其光伏性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

有机硅改性水泥混凝土制备，性能及机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

高效双光子荧光性能蚕丝支架的制备、发光机理及其生物成像应用

国家自然科学基金

0+阅读 · 2013年12月31日

基于核方法的非局部图像处理

国家自然科学基金

0+阅读 · 2012年12月31日

Hedgehog信号通路在tPA加剧脑梗塞血脑屏障损害中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

碳纳米材料-核酸适体新型分子识别体系的设计及其在电化学生物传感中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

农业用水效用评价方法及其尺度效应研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员