In recent years, zero-cost proxies are gaining ground in neural architecture search (NAS). These methods allow finding the optimal neural network for a given task faster and with a lesser computational load than conventional NAS methods. Equally important is the fact that they also shed some light on the internal workings of neural architectures. This paper presents a zero-cost metric that highly correlates with the train set accuracy across the NAS-Bench-101, NAS-Bench-201 and NAS-Bench-NLP benchmark datasets. Architectures are initialised with two distinct constant shared weights, one at a time. Then, a fixed random mini-batch of data is passed forward through each initialisation. We observe that the dispersion of the outputs between two initialisations positively correlates with trained accuracy. The correlation further improves when we normalise dispersion by average output magnitude. Our metric, epsilon, does not require gradients computation or labels. It thus unbinds the NAS procedure from training hyperparameters, loss metrics and human-labelled data. Our method is easy to integrate within existing NAS algorithms and takes a fraction of a second to evaluate a single network.
翻译:近年来,零成本代理指标在神经网络架构搜索(NAS)领域逐渐兴起。此类方法能以比传统NAS更快的速度、更低的计算开销为给定任务找到最优神经网络,同样重要的是,它们还能揭示神经网络架构的内部工作机制。本文提出一种零成本度量指标,该指标在NAS-Bench-101、NAS-Bench-201和NAS-Bench-NLP基准数据集上与训练集准确率高度相关。具体而言,我们依次使用两种不同的恒定共享权重对架构进行初始化,然后通过每次初始化将固定随机小批量数据前向传播。观察发现,两次初始化输出结果的离散度与训练后准确率呈正相关,当通过平均输出幅度对离散度进行归一化后,相关性进一步改善。我们的度量指标ε无需梯度计算或标签信息,从而将NAS流程与训练超参数、损失函数指标及人工标注数据解耦。该方法易于集成到现有NAS算法中,且评估单个网络仅需不到一秒的时间。