Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design - 专知论文

会员服务 ·

0

MoDELS · 缩放 · Vision · 塑造 · 推断 ·

2023 年 5 月 22 日

Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design

翻译：让ViT变形成功：计算最优模型设计的缩放定律

Ibrahim Alabdulmohsin,Xiaohua Zhai,Alexander Kolesnikov,Lucas Beyer

from arxiv, 10 pages, 7 figures, 9 tables

Scaling laws have been recently employed to derive compute-optimal model size (number of parameters) for a given compute duration. We advance and refine such methods to infer compute-optimal model shapes, such as width and depth, and successfully implement this in vision transformers. Our shape-optimized vision transformer, SoViT, achieves results competitive with models that exceed twice its size, despite being pre-trained with an equivalent amount of compute. For example, SoViT-400m/14 achieves 90.3% fine-tuning accuracy on ILSRCV2012, surpassing the much larger ViT-g/14 and approaching ViT-G/14 under identical settings, with also less than half the inference cost. We conduct a thorough evaluation across multiple tasks, such as image classification, captioning, VQA and zero-shot transfer, demonstrating the effectiveness of our model across a broad range of domains and identifying limitations. Overall, our findings challenge the prevailing approach of blindly scaling up vision models and pave a path for a more informed scaling.

翻译：缩放定律最近被用于在给定计算时长下推导计算最优模型大小（参数数量）。我们推进并细化了这些方法，以推断计算最优模型形状（如宽度和深度），并在视觉变换器中成功实现。我们的形状优化视觉变换器SoViT在预训练计算量相同的情况下，取得了与两倍以上规模模型竞争的结果。例如，SoViT-400m/14在ILSRCV2012上达到90.3%的微调准确率，在相同设置下超越规模更大的ViT-g/14并接近ViT-G/14，同时推理成本不到后者的一半。我们在图像分类、字幕生成、VQA和零样本迁移等多个任务上进行了全面评估，展示了模型在广泛领域的有效性并识别了局限性。总体而言，我们的研究结果挑战了盲目扩大视觉模型的现行方法，并为更明智的缩放铺平了道路。

0

相关内容

MoDELS

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

128+阅读 · 2022年4月21日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

93+阅读 · 2019年10月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

专知

13+阅读 · 2018年5月26日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

新型基于吡咯[4,3,2-de]喹啉类STAT3抑制剂的设计、合成与抗肿瘤活性研究

国家自然科学基金

0+阅读 · 2014年12月31日

芳腙基取代的杂环并嘧啶类PI3K/mTOR双重抑制剂的设计、合成与抗肿瘤活性研究

国家自然科学基金

0+阅读 · 2014年12月31日

靶向微管蛋白秋水仙碱位点的白藜芦醇-Combrestatin A-4类抑制剂的设计、合成及活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于静止卫星多星观测的场地TOA BRDF日更新算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

IMPDH为靶点的小分子抑制剂的设计、合成及活性研究

国家自然科学基金

0+阅读 · 2012年12月31日

人胚胎干细胞来源神经前体细胞移植对APP/PS1转基因鼠海马Aβ沉积与神经发生的影响及其机制

国家自然科学基金

0+阅读 · 2012年12月31日

β-分泌酶抑制剂的设计、合成与构效关系研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

广义Kloosterman和的均值估计

国家自然科学基金

1+阅读 · 2011年12月31日

苯氨甲酸酯类AChE抑制剂的设计、合成和双重作用研究

国家自然科学基金

0+阅读 · 2008年12月31日

Set Learning for Accurate and Calibrated Models

Arxiv

0+阅读 · 2023年7月10日

ADASSM: Adversarial Data Augmentation in Statistical Shape Models From Images

Arxiv

0+阅读 · 2023年7月6日

Self-supervised Optimization of Hand Pose Estimation using Anatomical Features and Iterative Learning

Arxiv

0+阅读 · 2023年7月6日

Deep Ensemble Learning with Frame Skipping for Face Anti-Spoofing

Arxiv

0+阅读 · 2023年7月6日

Robust and Accurate Superquadric Recovery: a Probabilistic Approach

Arxiv

0+阅读 · 2023年7月5日

Augmented Large Language Models with Parametric Knowledge Guiding

Arxiv

20+阅读 · 2023年5月8日

Learning Imbalanced Data with Vision Transformers

Arxiv

11+阅读 · 2023年3月8日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

VIP会员

文章信息

相关主题

最新内容

《反无人机蜂群：有人-无人协同防御场景下的编队重构分析》

《反无人机蜂群：有人-无人协同防御场景下的编队重构分析》

专知会员服务

4+阅读 · 今天12:53

《史诗怒火/咆哮雄狮行动：针对伊朗空中战役的战略分析》68页智库报告

《史诗怒火/咆哮雄狮行动：针对伊朗空中战役的战略分析》68页智库报告

专知会员服务

3+阅读 · 今天12:39

“愈演愈烈的欺骗与干扰博弈”：无人机与人工智能背景下俄乌强化以无人机为核心的电子战

“愈演愈烈的欺骗与干扰博弈”：无人机与人工智能背景下俄乌强化以无人机为核心的电子战

专知会员服务

2+阅读 · 今天12:32

乌克兰纵深打击如何重塑俄罗斯的战略选择

乌克兰纵深打击如何重塑俄罗斯的战略选择

专知会员服务

1+阅读 · 今天12:25

《分布式太空任务对比分析与综合建模及仿真环境》120页

《分布式太空任务对比分析与综合建模及仿真环境》120页

专知会员服务

1+阅读 · 今天12:14

俄乌战争中关于中程打击无人机部署的经验启示

俄乌战争中关于中程打击无人机部署的经验启示

专知会员服务

0+阅读 · 今天12:08

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

专知会员服务

5+阅读 · 7月23日

《基于强化学习的自动化红队测试》

《基于强化学习的自动化红队测试》

专知会员服务

4+阅读 · 7月23日

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

专知会员服务

6+阅读 · 7月23日

“天降毒雾”：无人机如何使化学战重返乌克兰战场

“天降毒雾”：无人机如何使化学战重返乌克兰战场

专知会员服务

2+阅读 · 7月23日

伊朗不对称防空战略的演进

伊朗不对称防空战略的演进

专知会员服务

4+阅读 · 7月23日

对抗环境下超视距目标打击的情报支援

对抗环境下超视距目标打击的情报支援

专知会员服务

10+阅读 · 7月22日

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

专知会员服务

4+阅读 · 7月22日

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

专知会员服务

8+阅读 · 7月22日

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

专知会员服务

11+阅读 · 7月22日

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

128+阅读 · 2022年4月21日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

93+阅读 · 2019年10月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《史诗怒火/咆哮雄狮行动：针对伊朗空中战役的战略分析》68页智库报告

乌克兰纵深打击如何重塑俄罗斯的战略选择

《反无人机蜂群：有人-无人协同防御场景下的编队重构分析》

“愈演愈烈的欺骗与干扰博弈”：无人机与人工智能背景下俄乌强化以无人机为核心的电子战

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

专知

13+阅读 · 2018年5月26日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

相关论文

Set Learning for Accurate and Calibrated Models

Arxiv

0+阅读 · 2023年7月10日

ADASSM: Adversarial Data Augmentation in Statistical Shape Models From Images

Arxiv

0+阅读 · 2023年7月6日

Self-supervised Optimization of Hand Pose Estimation using Anatomical Features and Iterative Learning

Arxiv

0+阅读 · 2023年7月6日

Deep Ensemble Learning with Frame Skipping for Face Anti-Spoofing

Arxiv

0+阅读 · 2023年7月6日

Robust and Accurate Superquadric Recovery: a Probabilistic Approach

Arxiv

0+阅读 · 2023年7月5日

Augmented Large Language Models with Parametric Knowledge Guiding

Arxiv

20+阅读 · 2023年5月8日

Learning Imbalanced Data with Vision Transformers

Arxiv

11+阅读 · 2023年3月8日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

相关基金

新型基于吡咯[4,3,2-de]喹啉类STAT3抑制剂的设计、合成与抗肿瘤活性研究

国家自然科学基金

0+阅读 · 2014年12月31日

芳腙基取代的杂环并嘧啶类PI3K/mTOR双重抑制剂的设计、合成与抗肿瘤活性研究

国家自然科学基金

0+阅读 · 2014年12月31日

靶向微管蛋白秋水仙碱位点的白藜芦醇-Combrestatin A-4类抑制剂的设计、合成及活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于静止卫星多星观测的场地TOA BRDF日更新算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

IMPDH为靶点的小分子抑制剂的设计、合成及活性研究

国家自然科学基金

0+阅读 · 2012年12月31日

人胚胎干细胞来源神经前体细胞移植对APP/PS1转基因鼠海马Aβ沉积与神经发生的影响及其机制

国家自然科学基金

0+阅读 · 2012年12月31日

β-分泌酶抑制剂的设计、合成与构效关系研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

广义Kloosterman和的均值估计

国家自然科学基金

1+阅读 · 2011年12月31日

苯氨甲酸酯类AChE抑制剂的设计、合成和双重作用研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员