Neural networks efficiently encode learned information within their parameters. Consequently, many tasks can be unified by treating neural networks themselves as input data. When doing so, recent studies demonstrated the importance of accounting for the symmetries and geometry of parameter spaces. However, those works developed architectures tailored to specific networks such as MLPs and CNNs without normalization layers, and generalizing such architectures to other types of networks can be challenging. In this work, we overcome these challenges by building new metanetworks - neural networks that take weights from other neural networks as input. Put simply, we carefully build graphs representing the input neural networks and process the graphs using graph neural networks. Our approach, Graph Metanetworks (GMNs), generalizes to neural architectures where competing methods struggle, such as multi-head attention layers, normalization layers, convolutional layers, ResNet blocks, and group-equivariant linear layers. We prove that GMNs are expressive and equivariant to parameter permutation symmetries that leave the input neural network functions unchanged. We validate the effectiveness of our method on several metanetwork tasks over diverse neural network architectures.
翻译:神经网络在其参数中高效编码所学信息。因此,许多任务可通过将神经网络本身作为输入数据来统一处理。近期研究表明,在此过程中需考虑参数空间的对称性和几何结构。然而,现有工作开发的架构仅适用于特定网络(如不含归一化层的MLP和CNN),将这些架构推广至其他类型网络存在困难。本研究通过构建新型元网络(以其他神经网络权重为输入的神经网络)克服了上述挑战。具体而言,我们精心构建表征输入神经网络的图结构,并利用图神经网络对其进行处理。我们的方法——图元网络(GMNs)——能够泛化至竞争方法难以处理的神经架构,包括多头注意力层、归一化层、卷积层、ResNet模块和群等变线性层。我们证明GMNs具有表达性,且对保持输入神经网络功能不变的参数排列对称性具有等变性。我们在多种神经架构的元网络任务上验证了该方法的有效性。