ZiCo: Zero-shot NAS via Inverse Coefficient of Variation on Gradients - 专知论文

会员服务 ·

0

零样本 · 样本 · 搜索 · 逆变 · 相关系数 ·

2023 年 4 月 12 日

ZiCo: Zero-shot NAS via Inverse Coefficient of Variation on Gradients

翻译：ZiCo：基于梯度变异系数的零样本神经架构搜索

Guihong Li,Yuedong Yang,Kartikeya Bhardwaj,Radu Marculescu

from arxiv, ICLR 2023 Spotlight

Neural Architecture Search (NAS) is widely used to automatically obtain the neural network with the best performance among a large number of candidate architectures. To reduce the search time, zero-shot NAS aims at designing training-free proxies that can predict the test performance of a given architecture. However, as shown recently, none of the zero-shot proxies proposed to date can actually work consistently better than a naive proxy, namely, the number of network parameters (#Params). To improve this state of affairs, as the main theoretical contribution, we first reveal how some specific gradient properties across different samples impact the convergence rate and generalization capacity of neural networks. Based on this theoretical analysis, we propose a new zero-shot proxy, ZiCo, the first proxy that works consistently better than #Params. We demonstrate that ZiCo works better than State-Of-The-Art (SOTA) proxies on several popular NAS-Benchmarks (NASBench101, NATSBench-SSS/TSS, TransNASBench-101) for multiple applications (e.g., image classification/reconstruction and pixel-level prediction). Finally, we demonstrate that the optimal architectures found via ZiCo are as competitive as the ones found by one-shot and multi-shot NAS methods, but with much less search time. For example, ZiCo-based NAS can find optimal architectures with 78.1%, 79.4%, and 80.4% test accuracy under inference budgets of 450M, 600M, and 1000M FLOPs, respectively, on ImageNet within 0.4 GPU days. Our code is available at https://github.com/SLDGroup/ZiCo.

翻译：神经架构搜索（NAS）被广泛用于从大量候选架构中自动获得性能最优的神经网络。为减少搜索时间，零样本NAS旨在设计无需训练即可预测给定架构测试性能的代理指标。然而近期研究表明，迄今提出的所有零样本代理指标均未能持续优于一个朴素代理——即网络参数数量（#Params）。为改善这一现状，我们首先从理论层面揭示了不同样本间特定的梯度特性如何影响神经网络的收敛速度与泛化能力。基于该理论分析，我们提出新型零样本代理指标ZiCo，这是首个持续优于#Params的代理指标。在多个主流NAS基准（NASBench101、NATSBench-SSS/TSS、TransNASBench-101）上，我们证明ZiCo在图像分类/重建及像素级预测等多种应用场景中均优于现有最优（SOTA）代理指标。最后，通过ZiCo搜索得到的最优架构与单次/多次NAS方法所得结果竞争力相当，但搜索耗时大幅降低。例如，基于ZiCo的NAS方法在ImageNet上以450M、600M和1000M FLOPs推理预算下，分别仅需0.4 GPU天即可获得78.1%、79.4%和80.4%的测试准确率。我们的代码已发布于https://github.com/SLDGroup/ZiCo。

0

相关内容

零样本

【ICLR 2023】Zico:基于梯度变异逆系数的零样本NAS

【ICLR 2023】Zico:基于梯度变异逆系数的零样本NAS

专知会员服务

7+阅读 · 2023年1月29日

【CVPR 2022】可转移的稀疏对抗性攻击，Transferable Sparse Adversarial Attack

【CVPR 2022】可转移的稀疏对抗性攻击，Transferable Sparse Adversarial Attack

专知会员服务

15+阅读 · 2022年3月12日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】非凸从动件的基于梯度的双层优化

专知会员服务

13+阅读 · 2021年10月12日

【ICML2021】 One-shot 权重共享神经网络结构搜索算法

专知会员服务

18+阅读 · 2021年8月4日

【CVPR2021】用随机标签的神经架构搜索

专知会员服务

12+阅读 · 2021年3月21日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【ACL2020】TriggerNER:使用实体触发器学习作为解释用于命名实体识别

【ACL2020】TriggerNER:使用实体触发器学习作为解释用于命名实体识别

专知会员服务

23+阅读 · 2020年4月18日

【论文|Google】基于元学习的排序架构，Ranking architectures using meta-learning

【论文|Google】基于元学习的排序架构，Ranking architectures using meta-learning

专知会员服务

18+阅读 · 2019年11月30日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

ICML 2022 | 阿里达摩院灵瞳实验室：基于最大熵原理的目标检测搜索

ICML 2022 | 阿里达摩院灵瞳实验室：基于最大熵原理的目标检测搜索

PaperWeekly

1+阅读 · 2022年8月19日

【NeurIPS 2020】核基渐进蒸馏加法器神经网络

【NeurIPS 2020】核基渐进蒸馏加法器神经网络

专知

13+阅读 · 2020年10月19日

【NeurIPS 2019】7篇自动化神经网络搜索(NAS)论文简读

【NeurIPS 2019】7篇自动化神经网络搜索(NAS)论文简读

专知

31+阅读 · 2019年9月12日

基于PyTorch/TorchText的自然语言处理库

基于PyTorch/TorchText的自然语言处理库

专知

28+阅读 · 2019年4月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

数据集|更大的行人重识别测试集 Market-1501+500k

数据集|更大的行人重识别测试集 Market-1501+500k

极市平台

26+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

新型小分子64B靶向抑制眼脉络膜黑色素瘤肝转移的分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

PI3K催化亚单位调控AKT活化影响非小细胞肺癌脑转移的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

液滴热毛细迁移的准定态假设适用性与稳定性研究

国家自然科学基金

0+阅读 · 2014年12月31日

HOXB-AS3/HOXB7/PAK4信号轴调控结直肠癌侵袭转移的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

组蛋白乙酰基转移酶PCAF通过乙酰化CDK4抑制胃癌增殖的研究

国家自然科学基金

0+阅读 · 2013年12月31日

选择性杀伤肺癌细胞的miRNA的筛选和功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

功率变换器非线性不稳定行为的washout滤波器控制方法

国家自然科学基金

0+阅读 · 2012年12月31日

拟Frobenius-Lusztig核

国家自然科学基金

0+阅读 · 2012年12月31日

肿瘤细胞中凋亡抑制蛋白CFLAR乙酰化调控的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

超过程及相关SPDE的研究

国家自然科学基金

0+阅读 · 2008年12月31日

Efficient PDE-Constrained optimization under high-dimensional uncertainty using derivative-informed neural operators

Arxiv

0+阅读 · 2023年5月31日

Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training

Arxiv

0+阅读 · 2023年5月31日

Exploring Partial Knowledge Base Inference in Biomedical Entity Linking

Arxiv

0+阅读 · 2023年5月31日

Efficient Training of Energy-Based Models Using Jarzynski Equality

Arxiv

0+阅读 · 2023年5月30日

A Recipe for Efficient SBIR Models: Combining Relative Triplet Loss with Batch Normalization and Knowledge Distillation

Arxiv

0+阅读 · 2023年5月30日

Multi-armed bandits for resource efficient, online optimization of language model pre-training: the use case of dynamic masking

Arxiv

0+阅读 · 2023年5月30日

Coherent Soft Imitation Learning

Arxiv

0+阅读 · 2023年5月29日

Full Stack Optimization of Transformer Inference: a Survey

Arxiv

19+阅读 · 2023年2月27日

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Arxiv

28+阅读 · 2021年6月16日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

VIP会员

文章信息

相关主题

最新内容

综述 | 从问答到任务完成：Agent系统与Harness设计

综述 | 从问答到任务完成：Agent系统与Harness设计

专知会员服务

1+阅读 · 今天16:54

Agentic RL：框架、实践与长程智能体训练

Agentic RL：框架、实践与长程智能体训练

专知会员服务

1+阅读 · 今天16:52

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

专知会员服务

6+阅读 · 今天8:00

重新思考无人机时代的生存能力

重新思考无人机时代的生存能力

专知会员服务

5+阅读 · 今天7:44

装甲突击旅：现代战争思考、战斗与组织

装甲突击旅：现代战争思考、战斗与组织

专知会员服务

4+阅读 · 今天7:28

在人工智能加速决策环境中拓展OODA循环

在人工智能加速决策环境中拓展OODA循环

专知会员服务

4+阅读 · 今天7:18

《廉价自杀式无人机战争的军事战略影响：乌克兰与伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰与伊朗案例研究》

专知会员服务

5+阅读 · 今天7:07

军事欺骗：供作战战术指挥官使用的工具

军事欺骗：供作战战术指挥官使用的工具

专知会员服务

4+阅读 · 今天7:03

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

专知会员服务

4+阅读 · 6月23日

综述 | 世界动作模型：少做梦，多行动

综述 | 世界动作模型：少做梦，多行动

专知会员服务

6+阅读 · 6月23日

美以伊冲突：无人机与人工智能的运用

美以伊冲突：无人机与人工智能的运用

专知会员服务

10+阅读 · 6月23日

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

专知会员服务

4+阅读 · 6月23日

《特种部队在透明战场中的生存力》最新报告

《特种部队在透明战场中的生存力》最新报告

专知会员服务

5+阅读 · 6月23日

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

专知会员服务

8+阅读 · 6月23日

《人工智能生成的零日漏洞：对未来作战的影响》

《人工智能生成的零日漏洞：对未来作战的影响》

专知会员服务

7+阅读 · 6月23日

相关VIP内容

【ICLR 2023】Zico:基于梯度变异逆系数的零样本NAS

【ICLR 2023】Zico:基于梯度变异逆系数的零样本NAS

专知会员服务

7+阅读 · 2023年1月29日

【CVPR 2022】可转移的稀疏对抗性攻击，Transferable Sparse Adversarial Attack

【CVPR 2022】可转移的稀疏对抗性攻击，Transferable Sparse Adversarial Attack

专知会员服务

15+阅读 · 2022年3月12日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】非凸从动件的基于梯度的双层优化

专知会员服务

13+阅读 · 2021年10月12日

【ICML2021】 One-shot 权重共享神经网络结构搜索算法

专知会员服务

18+阅读 · 2021年8月4日

【CVPR2021】用随机标签的神经架构搜索

专知会员服务

12+阅读 · 2021年3月21日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【ACL2020】TriggerNER:使用实体触发器学习作为解释用于命名实体识别

【ACL2020】TriggerNER:使用实体触发器学习作为解释用于命名实体识别

专知会员服务

23+阅读 · 2020年4月18日

【论文|Google】基于元学习的排序架构，Ranking architectures using meta-learning

【论文|Google】基于元学习的排序架构，Ranking architectures using meta-learning

专知会员服务

18+阅读 · 2019年11月30日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

热门VIP内容

开通专知VIP会员享更多权益服务

Agentic RL：框架、实践与长程智能体训练

重新思考无人机时代的生存能力

综述 | 从问答到任务完成：Agent系统与Harness设计

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

相关资讯

ICML 2022 | 阿里达摩院灵瞳实验室：基于最大熵原理的目标检测搜索

ICML 2022 | 阿里达摩院灵瞳实验室：基于最大熵原理的目标检测搜索

PaperWeekly

1+阅读 · 2022年8月19日

【NeurIPS 2020】核基渐进蒸馏加法器神经网络

【NeurIPS 2020】核基渐进蒸馏加法器神经网络

专知

13+阅读 · 2020年10月19日

【NeurIPS 2019】7篇自动化神经网络搜索(NAS)论文简读

【NeurIPS 2019】7篇自动化神经网络搜索(NAS)论文简读

专知

31+阅读 · 2019年9月12日

基于PyTorch/TorchText的自然语言处理库

基于PyTorch/TorchText的自然语言处理库

专知

28+阅读 · 2019年4月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

数据集|更大的行人重识别测试集 Market-1501+500k

数据集|更大的行人重识别测试集 Market-1501+500k

极市平台

26+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Efficient PDE-Constrained optimization under high-dimensional uncertainty using derivative-informed neural operators

Arxiv

0+阅读 · 2023年5月31日

Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training

Arxiv

0+阅读 · 2023年5月31日

Exploring Partial Knowledge Base Inference in Biomedical Entity Linking

Arxiv

0+阅读 · 2023年5月31日

Efficient Training of Energy-Based Models Using Jarzynski Equality

Arxiv

0+阅读 · 2023年5月30日

A Recipe for Efficient SBIR Models: Combining Relative Triplet Loss with Batch Normalization and Knowledge Distillation

Arxiv

0+阅读 · 2023年5月30日

Multi-armed bandits for resource efficient, online optimization of language model pre-training: the use case of dynamic masking

Arxiv

0+阅读 · 2023年5月30日

Coherent Soft Imitation Learning

Arxiv

0+阅读 · 2023年5月29日

Full Stack Optimization of Transformer Inference: a Survey

Arxiv

19+阅读 · 2023年2月27日

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Arxiv

28+阅读 · 2021年6月16日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

相关基金

新型小分子64B靶向抑制眼脉络膜黑色素瘤肝转移的分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

PI3K催化亚单位调控AKT活化影响非小细胞肺癌脑转移的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

液滴热毛细迁移的准定态假设适用性与稳定性研究

国家自然科学基金

0+阅读 · 2014年12月31日

HOXB-AS3/HOXB7/PAK4信号轴调控结直肠癌侵袭转移的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

组蛋白乙酰基转移酶PCAF通过乙酰化CDK4抑制胃癌增殖的研究

国家自然科学基金

0+阅读 · 2013年12月31日

选择性杀伤肺癌细胞的miRNA的筛选和功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

功率变换器非线性不稳定行为的washout滤波器控制方法

国家自然科学基金

0+阅读 · 2012年12月31日

拟Frobenius-Lusztig核

国家自然科学基金

0+阅读 · 2012年12月31日

肿瘤细胞中凋亡抑制蛋白CFLAR乙酰化调控的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

超过程及相关SPDE的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员