RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer - 专知论文

会员服务 ·

0

词元分析器 · 有效性 · 混合 · 重新参数化 · 网络设计 ·

2023 年 4 月 12 日

RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer

翻译：RIFormer：移除Token混合器仍保持视觉主干网络有效性

Jiahao Wang,Songyang Zhang,Yong Liu,Taiqiang Wu,Yujiu Yang,Xihui Liu,Kai Chen,Ping Luo,Dahua Lin

from arxiv, 8 pages, accepted by CVPR2023

This paper studies how to keep a vision backbone effective while removing token mixers in its basic building blocks. Token mixers, as self-attention for vision transformers (ViTs), are intended to perform information communication between different spatial tokens but suffer from considerable computational cost and latency. However, directly removing them will lead to an incomplete model structure prior, and thus brings a significant accuracy drop. To this end, we first develop an RepIdentityFormer base on the re-parameterizing idea, to study the token mixer free model architecture. And we then explore the improved learning paradigm to break the limitation of simple token mixer free backbone, and summarize the empirical practice into 5 guidelines. Equipped with the proposed optimization strategy, we are able to build an extremely simple vision backbone with encouraging performance, while enjoying the high efficiency during inference. Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy. We hope this work can serve as a starting point for the exploration of optimization-driven efficient network design. Project page: https://techmonsterwang.github.io/RIFormer/.

翻译：本文研究如何在移除基本构建模块中的Token混合器时，仍保持视觉主干网络的有效性。Token混合器作为视觉Transformer（ViT）的自注意力机制，旨在实现不同空间令牌间的信息交互，但会带来显著的计算开销与延迟。然而直接移除该组件将导致不完整的模型结构先验，进而引发严重的精度下降。为此，我们首先基于重参数化思想提出RepIdentityFormer架构，以探索无Token混合器的模型结构。进而研究改进的学习范式以突破简单无Token混合器主干网络的局限性，并将实践经验总结为五条指导原则。配合所提出的优化策略，我们能够构建一个极其简单的视觉主干网络，在保持高推理效率的同时获得令人鼓舞的性能。大量实验与消融分析表明，通过网络架构的归纳偏置，结合恰当的优化策略可融入简单网络结构。我们期望这项工作能成为探索优化驱动的高效网络设计的起点。项目主页：https://techmonsterwang.github.io/RIFormer/。

0

相关内容

词元分析器

词元分析器

【CVPR 2022】NUS&字节跳动提出Shunted Transformer：多尺度Token叠加

【CVPR 2022】NUS&字节跳动提出Shunted Transformer：多尺度Token叠加

专知会员服务

16+阅读 · 2022年4月8日

何恺明组新论文：只用ViT做主干也可以做好目标检测

何恺明组新论文：只用ViT做主干也可以做好目标检测

专知会员服务

30+阅读 · 2022年4月2日

【AAAI2022】基于交互式transformer和暹罗网络的视频目标分割

【AAAI2022】基于交互式transformer和暹罗网络的视频目标分割

专知会员服务

24+阅读 · 2022年2月6日

【AAAI2022】锚点DETR：基于transformer检测器的查询设计

【AAAI2022】锚点DETR：基于transformer检测器的查询设计

专知会员服务

13+阅读 · 2021年12月31日

NeurIPS 2021 | 寻MixTraining: 一种全新的物体检测训练范式

NeurIPS 2021 | 寻MixTraining: 一种全新的物体检测训练范式

专知会员服务

12+阅读 · 2021年12月9日

【NeurIPS2021】ResT:一个有效的视觉识别转换器

【NeurIPS2021】ResT:一个有效的视觉识别转换器

专知会员服务

23+阅读 · 2021年10月25日

【AAAI2021】克服图神经网络灾难性遗忘，Overcoming Catastrophic Forgetting in GNN

【AAAI2021】克服图神经网络灾难性遗忘，Overcoming Catastrophic Forgetting in GNN

专知会员服务

18+阅读 · 2020年12月15日

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

专知会员服务

85+阅读 · 2020年1月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

无需新型token mixer就能SOTA：MetaFormer视觉基线模型开源，刷新ImageNet记录

无需新型token mixer就能SOTA：MetaFormer视觉基线模型开源，刷新ImageNet记录

机器之心

1+阅读 · 2022年12月1日

手把手教你改进PAN！董超团队开源超大感受野注意力超分方案VapSR

手把手教你改进PAN！董超团队开源超大感受野注意力超分方案VapSR

极市平台

0+阅读 · 2022年10月24日

ECCV 2022 | 无需下游训练，Tip-Adapter大幅提升CLIP图像分类准确率

ECCV 2022 | 无需下游训练，Tip-Adapter大幅提升CLIP图像分类准确率

机器之心

4+阅读 · 2022年9月25日

谷歌推出多轴注意力方法，既改进ViT又提升MLP

谷歌推出多轴注意力方法，既改进ViT又提升MLP

机器之心

0+阅读 · 2022年9月9日

全新混合架构iFormer！将卷积和最大池化灵活移植到Transformer

全新混合架构iFormer！将卷积和最大池化灵活移植到Transformer

PaperWeekly

0+阅读 · 2022年6月21日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

SIRT1介导的Resveratrol对糖尿病视网膜病变“代谢记忆”的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Ferroportin1（FPN1)基因对破骨细胞分化和功能的调控及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

动态频谱环境下认知无线网络信息分发机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

HER2靶向新型纳米载体荷载BCRP-siRNA经UTMD逆转乳腺癌耐药性的实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

低交叉极化共形天线阵列综合的混合DE算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于局部决策融合的无线传感器网络诊断方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

水声网络跨层设计中的信道-网络联合编码技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

稀土掺杂对Co基Heusler合金磁性和费米能级的调控

国家自然科学基金

0+阅读 · 2011年12月31日

可逆数据嵌入编码及其在多媒体安全中的应用

国家自然科学基金

1+阅读 · 2011年12月31日

无线传感器网络中的能量空洞问题及其对策研究

国家自然科学基金

0+阅读 · 2009年12月31日

Solving Projected Model Counting by Utilizing Treewidth and its Limits

Arxiv

0+阅读 · 2023年5月30日

AMatFormer: Efficient Feature Matching via Anchor Matching Transformer

Arxiv

0+阅读 · 2023年5月30日

Saliency Map Verbalization: Comparing Feature Importance Representations from Model-free and Instruction-based Methods

Arxiv

0+阅读 · 2023年5月30日

Prompt-based Tuning of Transformer Models for Multi-Center Medical Image Segmentation

Arxiv

1+阅读 · 2023年5月30日

Leveraging Training Data in Few-Shot Prompting for Numerical Reasoning

Arxiv

0+阅读 · 2023年5月29日

ContrastNER: Contrastive-based Prompt Tuning for Few-shot NER

Arxiv

0+阅读 · 2023年5月29日

A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks

Arxiv

0+阅读 · 2023年5月26日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

Attention Bottlenecks for Multimodal Fusion

Arxiv

31+阅读 · 2021年6月30日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

VIP会员

文章信息

相关主题

词元分析器

重新参数化

最新内容

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

专知会员服务

4+阅读 · 今天8:00

重新思考无人机时代的生存能力

重新思考无人机时代的生存能力

专知会员服务

2+阅读 · 今天7:44

装甲突击旅：现代战争思考、战斗与组织

装甲突击旅：现代战争思考、战斗与组织

专知会员服务

2+阅读 · 今天7:28

在人工智能加速决策环境中拓展OODA循环

在人工智能加速决策环境中拓展OODA循环

专知会员服务

3+阅读 · 今天7:18

《廉价自杀式无人机战争的军事战略影响：乌克兰与伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰与伊朗案例研究》

专知会员服务

4+阅读 · 今天7:07

军事欺骗：供作战战术指挥官使用的工具

军事欺骗：供作战战术指挥官使用的工具

专知会员服务

3+阅读 · 今天7:03

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

专知会员服务

4+阅读 · 6月23日

综述 | 世界动作模型：少做梦，多行动

综述 | 世界动作模型：少做梦，多行动

专知会员服务

5+阅读 · 6月23日

美以伊冲突：无人机与人工智能的运用

美以伊冲突：无人机与人工智能的运用

专知会员服务

10+阅读 · 6月23日

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

专知会员服务

4+阅读 · 6月23日

《特种部队在透明战场中的生存力》最新报告

《特种部队在透明战场中的生存力》最新报告

专知会员服务

5+阅读 · 6月23日

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

专知会员服务

8+阅读 · 6月23日

《人工智能生成的零日漏洞：对未来作战的影响》

《人工智能生成的零日漏洞：对未来作战的影响》

专知会员服务

7+阅读 · 6月23日

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

专知会员服务

4+阅读 · 6月23日

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

6+阅读 · 6月22日

相关VIP内容

【CVPR 2022】NUS&字节跳动提出Shunted Transformer：多尺度Token叠加

【CVPR 2022】NUS&字节跳动提出Shunted Transformer：多尺度Token叠加

专知会员服务

16+阅读 · 2022年4月8日

何恺明组新论文：只用ViT做主干也可以做好目标检测

何恺明组新论文：只用ViT做主干也可以做好目标检测

专知会员服务

30+阅读 · 2022年4月2日

【AAAI2022】基于交互式transformer和暹罗网络的视频目标分割

【AAAI2022】基于交互式transformer和暹罗网络的视频目标分割

专知会员服务

24+阅读 · 2022年2月6日

【AAAI2022】锚点DETR：基于transformer检测器的查询设计

【AAAI2022】锚点DETR：基于transformer检测器的查询设计

专知会员服务

13+阅读 · 2021年12月31日

NeurIPS 2021 | 寻MixTraining: 一种全新的物体检测训练范式

NeurIPS 2021 | 寻MixTraining: 一种全新的物体检测训练范式

专知会员服务

12+阅读 · 2021年12月9日

【NeurIPS2021】ResT:一个有效的视觉识别转换器

【NeurIPS2021】ResT:一个有效的视觉识别转换器

专知会员服务

23+阅读 · 2021年10月25日

【AAAI2021】克服图神经网络灾难性遗忘，Overcoming Catastrophic Forgetting in GNN

【AAAI2021】克服图神经网络灾难性遗忘，Overcoming Catastrophic Forgetting in GNN

专知会员服务

18+阅读 · 2020年12月15日

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

专知会员服务

85+阅读 · 2020年1月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

重新思考无人机时代的生存能力

在人工智能加速决策环境中拓展OODA循环

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

装甲突击旅：现代战争思考、战斗与组织

相关资讯

无需新型token mixer就能SOTA：MetaFormer视觉基线模型开源，刷新ImageNet记录

无需新型token mixer就能SOTA：MetaFormer视觉基线模型开源，刷新ImageNet记录

机器之心

1+阅读 · 2022年12月1日

手把手教你改进PAN！董超团队开源超大感受野注意力超分方案VapSR

手把手教你改进PAN！董超团队开源超大感受野注意力超分方案VapSR

极市平台

0+阅读 · 2022年10月24日

ECCV 2022 | 无需下游训练，Tip-Adapter大幅提升CLIP图像分类准确率

ECCV 2022 | 无需下游训练，Tip-Adapter大幅提升CLIP图像分类准确率

机器之心

4+阅读 · 2022年9月25日

谷歌推出多轴注意力方法，既改进ViT又提升MLP

谷歌推出多轴注意力方法，既改进ViT又提升MLP

机器之心

0+阅读 · 2022年9月9日

全新混合架构iFormer！将卷积和最大池化灵活移植到Transformer

全新混合架构iFormer！将卷积和最大池化灵活移植到Transformer

PaperWeekly

0+阅读 · 2022年6月21日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Solving Projected Model Counting by Utilizing Treewidth and its Limits

Arxiv

0+阅读 · 2023年5月30日

AMatFormer: Efficient Feature Matching via Anchor Matching Transformer

Arxiv

0+阅读 · 2023年5月30日

Saliency Map Verbalization: Comparing Feature Importance Representations from Model-free and Instruction-based Methods

Arxiv

0+阅读 · 2023年5月30日

Prompt-based Tuning of Transformer Models for Multi-Center Medical Image Segmentation

Arxiv

1+阅读 · 2023年5月30日

Leveraging Training Data in Few-Shot Prompting for Numerical Reasoning

Arxiv

0+阅读 · 2023年5月29日

ContrastNER: Contrastive-based Prompt Tuning for Few-shot NER

Arxiv

0+阅读 · 2023年5月29日

A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks

Arxiv

0+阅读 · 2023年5月26日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

Attention Bottlenecks for Multimodal Fusion

Arxiv

31+阅读 · 2021年6月30日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

相关基金

SIRT1介导的Resveratrol对糖尿病视网膜病变“代谢记忆”的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Ferroportin1（FPN1)基因对破骨细胞分化和功能的调控及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

动态频谱环境下认知无线网络信息分发机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

HER2靶向新型纳米载体荷载BCRP-siRNA经UTMD逆转乳腺癌耐药性的实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

低交叉极化共形天线阵列综合的混合DE算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于局部决策融合的无线传感器网络诊断方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

水声网络跨层设计中的信道-网络联合编码技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

稀土掺杂对Co基Heusler合金磁性和费米能级的调控

国家自然科学基金

0+阅读 · 2011年12月31日

可逆数据嵌入编码及其在多媒体安全中的应用

国家自然科学基金

1+阅读 · 2011年12月31日

无线传感器网络中的能量空洞问题及其对策研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员