Neural Architecture Search (NAS) has emerged as one of the effective methods to design the optimal neural network architecture automatically. Although neural architectures have achieved human-level performances in several tasks, few of them are obtained from the NAS method. The main reason is the huge search space of neural architectures, making NAS algorithms inefficient. This work presents a novel architecture search algorithm, called GPT-NAS, that optimizes neural architectures by Generative Pre-Trained (GPT) model. In GPT-NAS, we assume that a generative model pre-trained on a large-scale corpus could learn the fundamental law of building neural architectures. Therefore, GPT-NAS leverages the generative pre-trained (GPT) model to propose reasonable architecture components given the basic one. Such an approach can largely reduce the search space by introducing prior knowledge in the search process. Extensive experimental results show that our GPT-NAS method significantly outperforms seven manually designed neural architectures and thirteen architectures provided by competing NAS methods. In addition, our ablation study indicates that the proposed algorithm improves the performance of finely tuned neural architectures by up to about 12% compared to those without GPT, further demonstrating its effectiveness in searching neural architectures.
翻译:神经架构搜索(NAS)已成为自动设计最优神经网络架构的有效方法之一。尽管神经架构在多项任务中已达到人类水平的表现,但其中鲜少有通过NAS方法获得的成果。其主要原因在于神经架构的搜索空间极为庞大,导致NAS算法效率低下。本文提出一种新颖的架构搜索算法——GPT-NAS,该算法利用生成式预训练(GPT)模型优化神经架构。在GPT-NAS中,我们假设在大规模语料库上预训练的生成模型能够学习构建神经架构的基本规律。因此,GPT-NAS借助生成式预训练(GPT)模型,在给定基础架构组件的前提下,提出合理的架构组件。这种方法通过在搜索过程中引入先验知识,大幅缩减了搜索空间。大量实验结果表明,我们的GPT-NAS方法显著优于七种人工设计的神经架构及十三个由竞争NAS方法提供的架构。此外,消融研究表明,与未使用GPT的方法相比,所提算法能将微调后的神经架构性能提升高达约12%,进一步证明了其在神经架构搜索中的有效性。