As more deep learning models are being applied in real-world applications, there is a growing need for modeling and learning the representations of neural networks themselves. An efficient representation can be used to predict target attributes of networks without the need for actual training and deployment procedures, facilitating efficient network deployment and design. Recently, inspired by the success of Transformer, some Transformer-based representation learning frameworks have been proposed and achieved promising performance in handling cell-structured models. However, graph neural network (GNN) based approaches still dominate the field of learning representation for the entire network. In this paper, we revisit Transformer and compare it with GNN to analyse their different architecture characteristics. We then propose a modified Transformer-based universal neural network representation learning model NAR-Former V2. It can learn efficient representations from both cell-structured networks and entire networks. Specifically, we first take the network as a graph and design a straightforward tokenizer to encode the network into a sequence. Then, we incorporate the inductive representation learning capability of GNN into Transformer, enabling Transformer to generalize better when encountering unseen architecture. Additionally, we introduce a series of simple yet effective modifications to enhance the ability of the Transformer in learning representation from graph structures. Our proposed method surpasses the GNN-based method NNLP by a significant margin in latency estimation on the NNLQP dataset. Furthermore, regarding accuracy prediction on the NASBench101 and NASBench201 datasets, our method achieves highly comparable performance to other state-of-the-art methods.
翻译:随着深度学习模型越来越多地应用于实际场景,对神经网络本身进行建模和表示学习的需求日益增长。高效的网络表示能够在不经过实际训练和部署流程的情况下预测网络的目标属性,从而促进网络的高效部署与设计。近期,受Transformer成功应用的启发,一些基于Transformer的表示学习框架已被提出,并在处理单元结构模型方面展现出令人瞩目的性能。然而,在图神经网络(GNN)领域,基于GNN的方法在整体网络的表示学习中仍占主导地位。本文重新审视了Transformer并将其与GNN进行对比,分析两者不同的架构特性。在此基础上,我们提出了一种改进的基于Transformer的通用神经网络表示学习模型——NAR-Former V2。该模型能够从单元结构网络和整体网络中学习高效表示。具体而言,我们首先将网络视为图,并设计了一种直观的标记化器将网络编码为序列。随后,我们将GNN的归纳表示学习能力融入Transformer,使Transformer在遇到未见过的架构时具备更强的泛化能力。此外,我们引入了一系列简单而有效的改进措施,以增强Transformer从图结构中学习表示的能力。在NNLQP数据集上的延迟估计任务中,所提方法显著超越了基于GNN的方法NNLP。同时,在NASBench101和NASBench201数据集上的精度预测任务中,我们的方法取得了与其他最先进方法高度可比的结果。