RSIR Transformer: Hierarchical Vision Transformer using Random Sampling Windows and Important Region Windows - 专知论文

会员服务 ·

0

随机采样 · 变换 · 分层 · 视觉任务 · IR ·

2023 年 4 月 13 日

RSIR Transformer: Hierarchical Vision Transformer using Random Sampling Windows and Important Region Windows

翻译：RSIR Transformer：基于随机采样窗口和重要区域窗口的分层视觉Transformer

Zhemin Zhang,Xun Gong

Recently, Transformers have shown promising performance in various vision tasks. However, the high costs of global self-attention remain challenging for Transformers, especially for high-resolution vision tasks. Local self-attention runs attention computation within a limited region for the sake of efficiency, resulting in insufficient context modeling as their receptive fields are small. In this work, we introduce two new attention modules to enhance the global modeling capability of the hierarchical vision transformer, namely, random sampling windows (RS-Win) and important region windows (IR-Win). Specifically, RS-Win sample random image patches to compose the window, following a uniform distribution, i.e., the patches in RS-Win can come from any position in the image. IR-Win composes the window according to the weights of the image patches in the attention map. Notably, RS-Win is able to capture global information throughout the entire model, even in earlier, high-resolution stages. IR-Win enables the self-attention module to focus on important regions of the image and capture more informative features. Incorporated with these designs, RSIR-Win Transformer demonstrates competitive performance on common vision tasks.

翻译：近期，Transformer在各类视觉任务中展现出卓越性能。然而，全局自注意力机制的高昂成本仍是其面临的挑战，尤其在高分辨率视觉任务中。局部自注意力为提升效率仅在有限区域内进行注意力计算，导致感受野较小，上下文建模能力不足。本文提出两种新型注意力模块以增强分层视觉Transformer的全局建模能力：随机采样窗口（RS-Win）与重要区域窗口（IR-Win）。具体而言，RS-Win按照均匀分布随机采样图像块构成窗口，即窗口内图像块可源自图像任意位置；IR-Win则根据注意力图中图像块的权重构建窗口。值得注意的是，RS-Win即使在早期高分辨率阶段也能捕获全局信息，而IR-Win使自注意力模块聚焦于图像重要区域并提取更具判别力的特征。结合这些设计，RSIR-Win Transformer在常见视觉任务中展现出竞争性能。

0

相关内容

随机采样

CVPR 2023｜打破CAM的局限性！ToCo：进一步激发 ViT 在弱监督语义分割的潜力

CVPR 2023｜打破CAM的局限性！ToCo：进一步激发 ViT 在弱监督语义分割的潜力

专知会员服务

21+阅读 · 2023年3月31日

【CVPR 2022】基于Transformer的图象风格化，StyTr2: Image Style Transfer with Transformers

【CVPR 2022】基于Transformer的图象风格化，StyTr2: Image Style Transfer with Transformers

专知会员服务

11+阅读 · 2022年3月19日

【CVPR 2022】MixFormer：跨窗口与维度的特征融合，MixFormer: Mixing Features across Windows and Dimensions

【CVPR 2022】MixFormer：跨窗口与维度的特征融合，MixFormer: Mixing Features across Windows and Dimensions

专知会员服务

15+阅读 · 2022年3月19日

【ICLR2022】GNN-LM基于全局信息的图神经网络语义理解模型

【ICLR2022】GNN-LM基于全局信息的图神经网络语义理解模型

专知会员服务

21+阅读 · 2022年2月12日

图像分割二十年，盘点影响力最大的10篇论文

图像分割二十年，盘点影响力最大的10篇论文

专知会员服务

45+阅读 · 2022年2月7日

【CVPR2021】动态区域注意卷积

专知会员服务

21+阅读 · 2021年4月2日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

专知会员服务

39+阅读 · 2020年2月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

一文带你浏览Graph Transformers

一文带你浏览Graph Transformers

PaperWeekly

1+阅读 · 2022年7月8日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

BatchNorm的“平替”？TUM提出KNConvNets，消除CNN中BatchNorm的缺点

BatchNorm的“平替”？TUM提出KNConvNets，消除CNN中BatchNorm的缺点

PaperWeekly

0+阅读 · 2022年6月1日

CVPR 2022 | 清华开源DAT：具有可变形注意力的视觉Transformer

CVPR 2022 | 清华开源DAT：具有可变形注意力的视觉Transformer

CVer

1+阅读 · 2022年5月24日

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

多层时空并行 Schwarz 算法的研究

国家自然科学基金

3+阅读 · 2017年12月31日

多重假设检验中的k-FWER控制

国家自然科学基金

0+阅读 · 2015年12月31日

数据中心以太网拥塞控制

国家自然科学基金

1+阅读 · 2015年12月31日

内质网Ca2+感受器STIM1调控糖尿病冠状动脉平滑肌细胞表型转化的机制

国家自然科学基金

0+阅读 · 2014年12月31日

基于混合优化的图像三维重建方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

雌激素通过ERα介导lncRNA 1200076调节卵巢ERα（+）细胞生物学行为

国家自然科学基金

0+阅读 · 2012年12月31日

自底向上的静态图像显著性检测

国家自然科学基金

1+阅读 · 2012年12月31日

区域环境要素对栓皮栎次生林生态系统土壤有机质稳定性的影响机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于NDVI的流域产流产沙对LUCC响应的快速预测及其尺度效应

国家自然科学基金

0+阅读 · 2009年12月31日

前馈神经网络的奇异学习动态研究

国家自然科学基金

0+阅读 · 2008年12月31日

Off-By-One Implementation Error in J-UNIWARD

Arxiv

0+阅读 · 2023年5月31日

CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models

Arxiv

0+阅读 · 2023年5月31日

UniFormer: Unifying Convolution and Self-attention for Visual Recognition

Arxiv

0+阅读 · 2023年5月31日

On the Power of Foundation Models

Arxiv

1+阅读 · 2023年5月31日

FinalMLP: An Enhanced Two-Stream MLP Model for CTR Prediction

Arxiv

0+阅读 · 2023年5月30日

TerrainNet: Visual Modeling of Complex Terrain for High-speed, Off-road Navigation

Arxiv

0+阅读 · 2023年5月29日

Understanding Predictive Coding as an Adaptive Trust-Region Method

Arxiv

0+阅读 · 2023年5月29日

InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions

Arxiv

0+阅读 · 2023年5月29日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

Learning Hierarchical Features for Visual Object Tracking with Recursive Neural Networks

Arxiv

13+阅读 · 2018年1月6日

VIP会员

文章信息

相关主题

最新内容

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

专知会员服务

6+阅读 · 6月25日

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

专知会员服务

5+阅读 · 6月25日

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

专知会员服务

7+阅读 · 6月25日

网状网络及其在军事领域的运用

网状网络及其在军事领域的运用

专知会员服务

7+阅读 · 6月25日

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

专知会员服务

7+阅读 · 6月25日

无美国参与的欧洲战争方式（万字长文）

无美国参与的欧洲战争方式（万字长文）

专知会员服务

8+阅读 · 6月25日

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

专知会员服务

9+阅读 · 6月25日

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

专知会员服务

8+阅读 · 6月25日

《国防领域敏感性分析白皮书》

《国防领域敏感性分析白皮书》

专知会员服务

8+阅读 · 6月25日

综述 | 从问答到任务完成：Agent系统与Harness设计

综述 | 从问答到任务完成：Agent系统与Harness设计

专知会员服务

9+阅读 · 6月24日

Agentic RL：框架、实践与长程智能体训练

Agentic RL：框架、实践与长程智能体训练

专知会员服务

10+阅读 · 6月24日

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

专知会员服务

11+阅读 · 6月24日

重新思考无人机时代的生存能力

重新思考无人机时代的生存能力

专知会员服务

10+阅读 · 6月24日

装甲突击旅：现代战争思考、战斗与组织

装甲突击旅：现代战争思考、战斗与组织

专知会员服务

7+阅读 · 6月24日

在人工智能加速决策环境中拓展OODA循环

在人工智能加速决策环境中拓展OODA循环

专知会员服务

10+阅读 · 6月24日

相关VIP内容

CVPR 2023｜打破CAM的局限性！ToCo：进一步激发 ViT 在弱监督语义分割的潜力

CVPR 2023｜打破CAM的局限性！ToCo：进一步激发 ViT 在弱监督语义分割的潜力

专知会员服务

21+阅读 · 2023年3月31日

【CVPR 2022】基于Transformer的图象风格化，StyTr2: Image Style Transfer with Transformers

【CVPR 2022】基于Transformer的图象风格化，StyTr2: Image Style Transfer with Transformers

专知会员服务

11+阅读 · 2022年3月19日

【CVPR 2022】MixFormer：跨窗口与维度的特征融合，MixFormer: Mixing Features across Windows and Dimensions

【CVPR 2022】MixFormer：跨窗口与维度的特征融合，MixFormer: Mixing Features across Windows and Dimensions

专知会员服务

15+阅读 · 2022年3月19日

【ICLR2022】GNN-LM基于全局信息的图神经网络语义理解模型

【ICLR2022】GNN-LM基于全局信息的图神经网络语义理解模型

专知会员服务

21+阅读 · 2022年2月12日

图像分割二十年，盘点影响力最大的10篇论文

图像分割二十年，盘点影响力最大的10篇论文

专知会员服务

45+阅读 · 2022年2月7日

【CVPR2021】动态区域注意卷积

专知会员服务

21+阅读 · 2021年4月2日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

专知会员服务

39+阅读 · 2020年2月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

网状网络及其在军事领域的运用

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

一文带你浏览Graph Transformers

一文带你浏览Graph Transformers

PaperWeekly

1+阅读 · 2022年7月8日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

BatchNorm的“平替”？TUM提出KNConvNets，消除CNN中BatchNorm的缺点

BatchNorm的“平替”？TUM提出KNConvNets，消除CNN中BatchNorm的缺点

PaperWeekly

0+阅读 · 2022年6月1日

CVPR 2022 | 清华开源DAT：具有可变形注意力的视觉Transformer

CVPR 2022 | 清华开源DAT：具有可变形注意力的视觉Transformer

CVer

1+阅读 · 2022年5月24日

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

相关论文

Off-By-One Implementation Error in J-UNIWARD

Arxiv

0+阅读 · 2023年5月31日

CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models

Arxiv

0+阅读 · 2023年5月31日

UniFormer: Unifying Convolution and Self-attention for Visual Recognition

Arxiv

0+阅读 · 2023年5月31日

On the Power of Foundation Models

Arxiv

1+阅读 · 2023年5月31日

FinalMLP: An Enhanced Two-Stream MLP Model for CTR Prediction

Arxiv

0+阅读 · 2023年5月30日

TerrainNet: Visual Modeling of Complex Terrain for High-speed, Off-road Navigation

Arxiv

0+阅读 · 2023年5月29日

Understanding Predictive Coding as an Adaptive Trust-Region Method

Arxiv

0+阅读 · 2023年5月29日

InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions

Arxiv

0+阅读 · 2023年5月29日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

Learning Hierarchical Features for Visual Object Tracking with Recursive Neural Networks

Arxiv

13+阅读 · 2018年1月6日

相关基金

多层时空并行 Schwarz 算法的研究

国家自然科学基金

3+阅读 · 2017年12月31日

多重假设检验中的k-FWER控制

国家自然科学基金

0+阅读 · 2015年12月31日

数据中心以太网拥塞控制

国家自然科学基金

1+阅读 · 2015年12月31日

内质网Ca2+感受器STIM1调控糖尿病冠状动脉平滑肌细胞表型转化的机制

国家自然科学基金

0+阅读 · 2014年12月31日

基于混合优化的图像三维重建方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

雌激素通过ERα介导lncRNA 1200076调节卵巢ERα（+）细胞生物学行为

国家自然科学基金

0+阅读 · 2012年12月31日

自底向上的静态图像显著性检测

国家自然科学基金

1+阅读 · 2012年12月31日

区域环境要素对栓皮栎次生林生态系统土壤有机质稳定性的影响机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于NDVI的流域产流产沙对LUCC响应的快速预测及其尺度效应

国家自然科学基金

0+阅读 · 2009年12月31日

前馈神经网络的奇异学习动态研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员