GPT-NAS: Neural Architecture Search with the Generative Pre-Trained Model

Neural Architecture Search (NAS) has emerged as one of the effective methods to design the optimal neural network architecture automatically. Although neural architectures have achieved human-level performances in several tasks, few of them are obtained from the NAS method. The main reason is the huge search space of neural architectures, making NAS algorithms inefficient. This work presents a novel architecture search algorithm, called GPT-NAS, that optimizes neural architectures by Generative Pre-Trained (GPT) model. In GPT-NAS, we assume that a generative model pre-trained on a large-scale corpus could learn the fundamental law of building neural architectures. Therefore, GPT-NAS leverages the generative pre-trained (GPT) model to propose reasonable architecture components given the basic one. Such an approach can largely reduce the search space by introducing prior knowledge in the search process. Extensive experimental results show that our GPT-NAS method significantly outperforms seven manually designed neural architectures and thirteen architectures provided by competing NAS methods. In addition, our ablation study indicates that the proposed algorithm improves the performance of finely tuned neural architectures by up to about 12% compared to those without GPT, further demonstrating its effectiveness in searching neural architectures.

翻译：神经架构搜索（NAS）已成为自动设计最优神经网络架构的有效方法之一。尽管神经架构在多项任务中已达到人类水平的表现，但其中鲜少有通过NAS方法获得的成果。其主要原因在于神经架构的搜索空间极为庞大，导致NAS算法效率低下。本文提出一种新颖的架构搜索算法——GPT-NAS，该算法利用生成式预训练（GPT）模型优化神经架构。在GPT-NAS中，我们假设在大规模语料库上预训练的生成模型能够学习构建神经架构的基本规律。因此，GPT-NAS借助生成式预训练（GPT）模型，在给定基础架构组件的前提下，提出合理的架构组件。这种方法通过在搜索过程中引入先验知识，大幅缩减了搜索空间。大量实验结果表明，我们的GPT-NAS方法显著优于七种人工设计的神经架构及十三个由竞争NAS方法提供的架构。此外，消融研究表明，与未使用GPT的方法相比，所提算法能将微调后的神经架构性能提升高达约12%，进一步证明了其在神经架构搜索中的有效性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【如何做研究】How to research ，22页ppt

专知会员服务

114+阅读 · 2021年4月17日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日