Vision Transformer Off-the-Shelf: A Surprising Baseline for Few-Shot Class-Agnostic Counting - 专知论文

会员服务 ·

0

INFORMS · 变换 · Vision · 基准 · 自注意力机制 ·

2023 年 5 月 8 日

Vision Transformer Off-the-Shelf: A Surprising Baseline for Few-Shot Class-Agnostic Counting

翻译：现成视觉Transformer：少样本类别无关计数的惊人基线

Zhicheng Wang,Liwen Xiao,Zhiguo Cao,Hao Lu

Class-agnostic counting (CAC) aims to count objects of interest from a query image given few exemplars. This task is typically addressed by extracting the features of query image and exemplars respectively with (un)shared feature extractors and by matching their feature similarity, leading to an extract-\textit{then}-match paradigm. In this work, we show that CAC can be simplified in an extract-\textit{and}-match manner, particularly using a pretrained and plain vision transformer (ViT) where feature extraction and similarity matching are executed simultaneously within the self-attention. We reveal the rationale of such simplification from a decoupled view of the self-attention and point out that the simplification is only made possible if the query and exemplar tokens are concatenated as input. The resulting model, termed CACViT, simplifies the CAC pipeline and unifies the feature spaces between the query image and exemplars. In addition, we find CACViT naturally encodes background information within self-attention, which helps reduce background disturbance. Further, to compensate the loss of the scale and the order-of-magnitude information due to resizing and normalization in ViT, we present two effective strategies for scale and magnitude embedding. Extensive experiments on the FSC147 and the CARPK datasets show that CACViT significantly outperforms state-of-the-art CAC approaches in both effectiveness (23.60% error reduction) and generalization, which suggests CACViT provides a concise and strong baseline for CAC. Code will be available.

翻译：类别无关计数（CAC）旨在通过给定少量示例图像，从查询图像中统计感兴趣的目标数量。通常该任务通过分别使用（非）共享特征提取器提取查询图像和示例图像的特征，并匹配其特征相似度来解决，形成"提取-然后-匹配"范式。在本工作中，我们证明CAC可以简化为"提取-并-匹配"方式，尤其在使用预训练纯视觉Transformer（ViT）时，特征提取和相似度匹配可在自注意力机制中同步完成。我们从自注意力的解耦视角揭示了这种简化的原理，并指出仅当查询令牌与示例令牌被拼接为输入时，这种简化才可能实现。所得模型称为CACViT，它简化了CAC流程并统一了查询图像与示例图像的特征空间。此外，我们发现CACViT在自注意力中天然编码了背景信息，有助于减少背景干扰。为补偿ViT中图像缩放和归一化导致的尺度与量级信息损失，我们提出了两种有效的尺度与量级嵌入策略。在FSC147和CARPK数据集上的大量实验表明，CACViT在有效性（错误率降低23.60%）和泛化能力上均显著超越现有最优CAC方法，表明CACViT为CAC任务提供了简洁而强大的基线。代码将公开提供。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

自然语言处理顶会EMNLP2018接受论文列表！

自然语言处理顶会EMNLP2018接受论文列表！

专知

87+阅读 · 2018年8月26日

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

专知

37+阅读 · 2018年2月21日

可解释的CNN

可解释的CNN

CreateAMind

18+阅读 · 2017年10月5日

Waardenburg综合征的拷贝数变异检测及其致病机制的研究

国家自然科学基金

0+阅读 · 2015年12月31日

多秩信号稳健宽线性波束形成方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于Exemplar-Classifier思想的高分辨率光学遥感影像目标识别研究

国家自然科学基金

2+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

图像恢复的非局部稀疏建模理论及算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型硅基有序纳米尺寸多孔硅复合结构气敏传感器研究

国家自然科学基金

0+阅读 · 2012年12月31日

液相还原法制备Heusler合金纳米颗粒及其结构和性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于超高分辨率视频的HEVC低复杂度模型和方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

功能化离子液体－聚合物渗透汽化复合膜的制备及其分离性能的基础研究

国家自然科学基金

0+阅读 · 2008年12月31日

可伸缩视频流的最优化决策及适配化传输

国家自然科学基金

0+阅读 · 2008年12月31日

Minimalist and High-Quality Panoramic Imaging with PSF-aware Transformers

Arxiv

0+阅读 · 2023年6月22日

Data-Free Backbone Fine-Tuning for Pruned Neural Networks

Arxiv

0+阅读 · 2023年6月22日

UniHCP: A Unified Model for Human-Centric Perceptions

Arxiv

0+阅读 · 2023年6月22日

Bidirectional End-to-End Learning of Retriever-Reader Paradigm for Entity Linking

Arxiv

0+阅读 · 2023年6月21日

Improved prediction of hiking speeds using a data driven approach

Arxiv

0+阅读 · 2023年6月21日

Multimodal Fusion Transformer for Remote Sensing Image Classification

Arxiv

0+阅读 · 2023年6月20日

Minimax optimal testing by classification

Arxiv

0+阅读 · 2023年6月19日

FPANet: Frequency-based Video Demoireing using Frame-level Post Alignment

Arxiv

0+阅读 · 2023年6月19日

EDTER: Edge Detection with Transformer

Arxiv

11+阅读 · 2022年3月16日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

VIP会员

文章信息

相关主题

自注意力机制

最新内容

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

专知会员服务

5+阅读 · 今天8:00

重新思考无人机时代的生存能力

重新思考无人机时代的生存能力

专知会员服务

4+阅读 · 今天7:44

装甲突击旅：现代战争思考、战斗与组织

装甲突击旅：现代战争思考、战斗与组织

专知会员服务

4+阅读 · 今天7:28

在人工智能加速决策环境中拓展OODA循环

在人工智能加速决策环境中拓展OODA循环

专知会员服务

4+阅读 · 今天7:18

《廉价自杀式无人机战争的军事战略影响：乌克兰与伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰与伊朗案例研究》

专知会员服务

5+阅读 · 今天7:07

军事欺骗：供作战战术指挥官使用的工具

军事欺骗：供作战战术指挥官使用的工具

专知会员服务

4+阅读 · 今天7:03

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

专知会员服务

4+阅读 · 6月23日

综述 | 世界动作模型：少做梦，多行动

综述 | 世界动作模型：少做梦，多行动

专知会员服务

5+阅读 · 6月23日

美以伊冲突：无人机与人工智能的运用

美以伊冲突：无人机与人工智能的运用

专知会员服务

10+阅读 · 6月23日

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

专知会员服务

4+阅读 · 6月23日

《特种部队在透明战场中的生存力》最新报告

《特种部队在透明战场中的生存力》最新报告

专知会员服务

5+阅读 · 6月23日

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

专知会员服务

8+阅读 · 6月23日

《人工智能生成的零日漏洞：对未来作战的影响》

《人工智能生成的零日漏洞：对未来作战的影响》

专知会员服务

7+阅读 · 6月23日

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

专知会员服务

4+阅读 · 6月23日

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

6+阅读 · 6月22日

相关VIP内容

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

重新思考无人机时代的生存能力

在人工智能加速决策环境中拓展OODA循环

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

装甲突击旅：现代战争思考、战斗与组织

相关资讯

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

自然语言处理顶会EMNLP2018接受论文列表！

自然语言处理顶会EMNLP2018接受论文列表！

专知

87+阅读 · 2018年8月26日

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

专知

37+阅读 · 2018年2月21日

可解释的CNN

可解释的CNN

CreateAMind

18+阅读 · 2017年10月5日

相关论文

Minimalist and High-Quality Panoramic Imaging with PSF-aware Transformers

Arxiv

0+阅读 · 2023年6月22日

Data-Free Backbone Fine-Tuning for Pruned Neural Networks

Arxiv

0+阅读 · 2023年6月22日

UniHCP: A Unified Model for Human-Centric Perceptions

Arxiv

0+阅读 · 2023年6月22日

Bidirectional End-to-End Learning of Retriever-Reader Paradigm for Entity Linking

Arxiv

0+阅读 · 2023年6月21日

Improved prediction of hiking speeds using a data driven approach

Arxiv

0+阅读 · 2023年6月21日

Multimodal Fusion Transformer for Remote Sensing Image Classification

Arxiv

0+阅读 · 2023年6月20日

Minimax optimal testing by classification

Arxiv

0+阅读 · 2023年6月19日

FPANet: Frequency-based Video Demoireing using Frame-level Post Alignment

Arxiv

0+阅读 · 2023年6月19日

EDTER: Edge Detection with Transformer

Arxiv

11+阅读 · 2022年3月16日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

相关基金

Waardenburg综合征的拷贝数变异检测及其致病机制的研究

国家自然科学基金

0+阅读 · 2015年12月31日

多秩信号稳健宽线性波束形成方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于Exemplar-Classifier思想的高分辨率光学遥感影像目标识别研究

国家自然科学基金

2+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

图像恢复的非局部稀疏建模理论及算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型硅基有序纳米尺寸多孔硅复合结构气敏传感器研究

国家自然科学基金

0+阅读 · 2012年12月31日

液相还原法制备Heusler合金纳米颗粒及其结构和性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于超高分辨率视频的HEVC低复杂度模型和方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

功能化离子液体－聚合物渗透汽化复合膜的制备及其分离性能的基础研究

国家自然科学基金

0+阅读 · 2008年12月31日

可伸缩视频流的最优化决策及适配化传输

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员