Neural Architecture Search (NAS) is widely used to automatically obtain the neural network with the best performance among a large number of candidate architectures. To reduce the search time, zero-shot NAS aims at designing training-free proxies that can predict the test performance of a given architecture. However, as shown recently, none of the zero-shot proxies proposed to date can actually work consistently better than a naive proxy, namely, the number of network parameters (#Params). To improve this state of affairs, as the main theoretical contribution, we first reveal how some specific gradient properties across different samples impact the convergence rate and generalization capacity of neural networks. Based on this theoretical analysis, we propose a new zero-shot proxy, ZiCo, the first proxy that works consistently better than #Params. We demonstrate that ZiCo works better than State-Of-The-Art (SOTA) proxies on several popular NAS-Benchmarks (NASBench101, NATSBench-SSS/TSS, TransNASBench-101) for multiple applications (e.g., image classification/reconstruction and pixel-level prediction). Finally, we demonstrate that the optimal architectures found via ZiCo are as competitive as the ones found by one-shot and multi-shot NAS methods, but with much less search time. For example, ZiCo-based NAS can find optimal architectures with 78.1%, 79.4%, and 80.4% test accuracy under inference budgets of 450M, 600M, and 1000M FLOPs, respectively, on ImageNet within 0.4 GPU days. Our code is available at https://github.com/SLDGroup/ZiCo.
翻译:神经架构搜索(NAS)被广泛用于从大量候选架构中自动获取性能最优的神经网络。为减少搜索时间,零样本NAS旨在设计无需训练的代理指标来预测给定架构的测试性能。然而,近期研究表明,迄今为止提出的所有零样本代理指标均无法持续优于朴素指标——即网络参数量(#Params)。为改善这一现状,本文的主要理论贡献在于:首次揭示了不同样本间的特定梯度属性如何影响神经网络的收敛速率与泛化能力。基于这一理论分析,我们提出了一种新型零样本代理指标ZiCo——这是首个能够持续优于#Params的代理指标。实验表明,在多个主流NAS基准测试(NASBench101、NATSBench-SSS/TSS、TransNASBench-101)中,ZiCo在图像分类/重建及像素级预测等多种任务上的表现均优于现有最先进(SOTA)代理指标。最后,我们证明通过ZiCo搜索得到的最优架构与单次/多次搜索NAS方法的结果具有同等竞争力,但搜索时间显著缩减。例如,基于ZiCo的NAS在ImageNet上仅需0.4 GPU天即可在450M、600M和1000M FLOPs推理预算下分别达到78.1%、79.4%和80.4%的测试准确率。代码已开源至https://github.com/SLDGroup/ZiCo。