Neural networks efficiently encode learned information within their parameters. Consequently, many tasks can be unified by treating neural networks themselves as input data. When doing so, recent studies demonstrated the importance of accounting for the symmetries and geometry of parameter spaces. However, those works developed architectures tailored to specific networks such as MLPs and CNNs without normalization layers, and generalizing such architectures to other types of networks can be challenging. In this work, we overcome these challenges by building new metanetworks - neural networks that take weights from other neural networks as input. Put simply, we carefully build graphs representing the input neural networks and process the graphs using graph neural networks. Our approach, Graph Metanetworks (GMNs), generalizes to neural architectures where competing methods struggle, such as multi-head attention layers, normalization layers, convolutional layers, ResNet blocks, and group-equivariant linear layers. We prove that GMNs are expressive and equivariant to parameter permutation symmetries that leave the input neural network functions unchanged. We validate the effectiveness of our method on several metanetwork tasks over diverse neural network architectures.
翻译:神经网络在其参数中高效编码了学习到的信息。因此,许多任务可以通过将神经网络本身作为输入数据来统一处理。近年来研究表明,处理这类任务时必须考虑参数空间的对称性和几何结构。然而,现有工作开发的架构仅适用于特定网络(如不含归一化层的多层感知机和卷积神经网络),难以推广到其他类型的网络。本研究通过构建新型元网络(即接收其他神经网络权重作为输入的神经网络)克服了这些挑战。具体而言,我们谨慎地将输入神经网络构建为图结构,并使用图神经网络进行处理。我们的方法——图元网络(Graph Metanetworks, GMNs)——可推广到现有方法难以处理的神经架构,包括多头注意力层、归一化层、卷积层、残差网络块以及群等变线性层。我们证明图元网络具有表达能力,且对保持输入神经网络函数不变的参数排列对称性呈现等变性。通过涵盖多种神经架构的元网络任务实验,我们验证了该方法的效果。